0% found this document useful (0 votes)
31 views14 pages

Engemann 2018

This study investigates the reliability of EEG markers in classifying states of consciousness in patients with disorders of consciousness (DOC) using machine learning techniques. An analysis of 327 recordings from patients and healthy controls across two research centers demonstrated that a non-parametric classifier, DOC-Forest, effectively generalizes across different EEG configurations and protocols, achieving a predictive area under the curve (AUC) of ~0.77. The findings indicate that multivariate pattern classification outperforms univariate methods, highlighting the potential for automated EEG analysis in clinical settings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views14 pages

Engemann 2018

This study investigates the reliability of EEG markers in classifying states of consciousness in patients with disorders of consciousness (DOC) using machine learning techniques. An analysis of 327 recordings from patients and healthy controls across two research centers demonstrated that a non-parametric classifier, DOC-Forest, effectively generalizes across different EEG configurations and protocols, achieving a predictive area under the curve (AUC) of ~0.77. The findings indicate that multivariate pattern classification outperforms univariate methods, highlighting the potential for automated EEG analysis in clinical settings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

doi:10.

1093/brain/awy251 BRAIN 2018: Page 1 of 14 | 1

Robust EEG-based cross-site and cross-protocol

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


classification of states of consciousness
Denis A. Engemann,1,2,3,* Federico Raimondo,3,4,5,6,* Jean-Rémi King,2,7,8
Benjamin Rohaut,3,9 Gilles Louppe,7 Frédéric Faugeras,3 Jitka Annen,10 Helena Cassol,10
Olivia Gosseries,10 Diego Fernandez-Slezak,4,5 Steven Laureys,10 Lionel Naccache,3,6
Stanislas Dehaene2,11 and Jacobo D. Sitt3,6

*These authors contributed equally to this work.

Determining the state of consciousness in patients with disorders of consciousness is a challenging practical and theoretical prob-
lem. Recent findings suggest that multiple markers of brain activity extracted from the EEG may index the state of consciousness in
the human brain. Furthermore, machine learning has been found to optimize their capacity to discriminate different states of
consciousness in clinical practice. However, it is unknown how dependable these EEG markers are in the face of signal variability
because of different EEG configurations, EEG protocols and subpopulations from different centres encountered in practice. In this
study we analysed 327 recordings of patients with disorders of consciousness (148 unresponsive wakefulness syndrome and 179
minimally conscious state) and 66 healthy controls obtained in two independent research centres (Paris Pitié-Salpêtrière and Liège).
We first show that a non-parametric classifier based on ensembles of decision trees provides robust out-of-sample performance on
unseen data with a predictive area under the curve (AUC) of ~0.77 that was only marginally affected when using alternative EEG
configurations (different numbers and positions of sensors, numbers of epochs, average AUC = 0.750  0.014). In a second step,
we observed that classifiers based on multiple as well as single EEG features generalize to recordings obtained from different
patient cohorts, EEG protocols and different centres. However, the multivariate model always performed best with a predictive
AUC of 0.73 for generalization from Paris 1 to Paris 2 datasets, and an AUC of 0.78 from Paris to Liège datasets. Using
simulations, we subsequently demonstrate that multivariate pattern classification has a decisive performance advantage over
univariate classification as the stability of EEG features decreases, as different EEG configurations are used for feature-extraction
or as noise is added. Moreover, we show that the generalization performance from Paris to Liège remains stable even if up to 20%
of the diagnostic labels are randomly flipped. Finally, consistent with recent literature, analysis of the learned decision rules of our
classifier suggested that markers related to dynamic fluctuations in theta and alpha frequency bands carried independent informa-
tion and were most influential. Our findings demonstrate that EEG markers of consciousness can be reliably, economically and
automatically identified with machine learning in various clinical and acquisition contexts.

1 Parietal project-team, INRIA Saclay – Île de France, France


2 Cognitive Neuroimaging Unit, CEA DSV/I2BM, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center,
91191 Gif sur Yvette, France
3 Inserm U 1127, CNRS UMR 7225, Institut du Cerveau et de la Moelle épinière, ICM, F-75013, Paris, France
4 Laboratorio de Inteligencia Artificial Aplicada, Departamento de Computación FCEyN, UBA, Argentina
5 CONICET – Universidad de Buenos Aires, Instituto de Investigación en Ciencias de la Computación, Godoy Cruz 2290,
C1425FQB, Ciudad Autónoma de Buenos Aires, Argentina
6 Sorbonne Universités, UPMC Université Paris 06, Faculté de Médecine Pitié-Salpêtrière, Paris, France
7 New York University, 6 Washington Place, New York, NY, USA
8 Frankfurt Institute for Advanced Studies, Frankfurt, Germany

Received February 1, 2018. Revised August 3, 2018. Accepted August 20, 2018.
ß The Author(s) (2018). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved.
For permissions, please email: [email protected]
2 | BRAIN 2018: Page 2 of 14 D. A. Engemann et al.

9 Department of Neurology, Columbia University, New York, NY, USA


10 Coma Science Group, GIGA Consciousness, University and University Hospital of Liège, Liège, Belgium
11 Collège de France, Paris, France

Correspondence to: Denis A. Engemann


1 Rue Honoré d’Estienne d’Orves
91120 Saclay, France
E-mail: [email protected]

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


Correspondence may also be addressed to: Federico Raimondo
ICM - Hôpital Pitié Salpêtrière, 47 bd de l’hôpital, 75013 Paris, France
E-mail: [email protected]

Jacobo Sitt
ICM - Hôpital Pitié Salpêtrière, 47 bd de l’hôpital, 75013 Paris, France
E-mail: [email protected]

Keywords: electroencephalography; disorders of consciousness; biomarker; machine learning; diagnosis


Abbreviations: AUC = area under the curve; DOC = disorders of consciousness; MCS = minimally conscious state; MPVA =
multivariate pattern analysis; UWS = unresponsive wakefulness syndrome; wSMI = weighted symbolic mutual information

recover (Luauté et al., 2010; Faugeras et al., 2018), which


Introduction emphasizes the importance of reliable diagnostic tools.
In the past two decades, non-invasive brain imaging has
Patients suffering from disorders of consciousness (DOC)
supplemented behavioural assessments for detection of con-
demonstrate that it is possible to be awake in the absence
sciousness. Sleep studies and neurological assessments have
of behavioural evidence of consciousness (Laureys et al.,
early on revealed preferentially altered EEG amplitudes in
2010). Despite best efforts for consistency, current diagnos-
the delta (2–4 Hz), theta (4–8 Hz) and alpha (8–12 Hz)
tic procedures rely on human interaction and are, therefore,
frequency ranges (Emmons and Simon, 1956; Rosenberg
error-prone (Rohaut and Claassen, 2018). The degree of
et al., 1977). PET revealed globally decreased glucose
misdiagnosis in patients with DOC may exceed 40%
uptake in patients with DOC as compared to healthy con-
when relying on the clinician’s judgement without standar-
trols (Stender et al., 2014). Several functional MRI studies
dized behavioural assessments (Schnakers et al., 2009).
have documented disruption of functional connectivity
Even when using diagnostic instruments such as the
along diverse subcortical and neocortical pathways in pa-
Coma Recovery Scale-Revised (CRS-R) (Giacino et al.,
tients with DOC (Demertzi et al., 2014). Ever since, ad-
2004), misdiagnosis can remain high if patients are not
vances in cognitive science have allowed one to infer
assessed repeatedly within a short time window (Wannez consciousness from increasingly fine-grained patterns of
et al., 2017). Furthermore, in some cases evidence of con- brain activity. Accordingly, recurrent interactions between
scious processing in these patients can only be obtained higher-order neocortical networks, as well as the morph-
using functional neuroimaging where patients sometimes ology and complexity of brain dynamics in response to
demonstrate wilful modulations of their brain activity stimulation have been related to the states of consciousness
(Owen et al., 2006; Monti et al., 2010). These patients (Tononi and Edelman, 1998; Dehaene and Naccache,
have been labelled as ‘covert awareness’ or ‘cognitive 2001; Casali et al., 2013; Iotzov et al., 2017), which has
motor dissociation (CMD)’ patients (Gosseries et al., led to various types of putative markers of consciousness.
2014; Schiff, 2015; Curley et al., 2018). Following recent trends in neuroimaging, the increasing
Among the DOC one distinguishes the comatose state, number of neural markers of consciousness is likely to be
the unresponsive wakefulness syndrome (UWS, historically best approached with multivariate pattern analysis (MVPA)
vegetative state), and the minimally conscious state (MCS) (Naci et al., 2012; King et al., 2013b; Claassen et al., 2016).
(Giacino et al., 2002; Laureys et al., 2010). The presence of Indeed, machine learning algorithms can be trained to best
eye-opening helps to distinguish UWS patients from coma- predict the medical status of individual patients from un-
tose ones (Jennett and Plum, 1972). Additionally, MCS but known combinations of physiological markers (for example,
not UWS patients show signs of awareness (i.e. visual pur- Chang et al., 2005). Typically, a classifier is trained to op-
suit in MCS– and command following in MCS+) (Bruno timally discriminate clinical labels based on brain data.
et al., 2011) while neither achieving functional communi- Generalization performance is then assessed by comparing
cation nor object use. It is nevertheless believed that these the predictions of the classifier to the actual diagnosis
patients can have a partial and fluctuating awareness of when presented with unseen data. In the absence of inde-
themselves and their surroundings and are more likely to pendent datasets, cross-validation is performed to estimate
Robust classification of disorders of consciousness from EEG BRAIN 2018: Page 3 of 14 | 3

the out-of-sample performance by subdividing the data into prospective generalization on independent data (Woo
training and testing sets and averaging over testing set et al., 2017)? Are single markers sufficiently powerful and
scores. It is, however, noteworthy that cross-validation when does multivariate classification provide the clearest
tends to be too optimistic when sample sizes are small advantage?
(Varoquaux et al., 2016; Varoquaux, 2018; Woo et al., To address these questions, we rigorously probed the ro-
2017), rendering face-value interpretation of scores futile bustness and validity of EEG markers of consciousness.
for a significant proportion of neuroimaging studies. Using the robust Extra-Trees algorithm (Geurts et al.,
Examples of MVPA for the study of patients with DOC 2006) we developed a classifier to differentiate UWS from

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


include the analysis of patterns of resting state functional MCS patients (which we termed ‘DOC-Forest’). This clas-
MRI functional connectivity (Demertzi et al., 2015), spectral sifier was trained and tested using 28 potential EEG mar-
responses to command following (Goldfine et al., 2011; kers of consciousness from 249 patients recorded at the
Cruse et al., 2012) and cerebral metabolism to distinguish Paris Pitié-Salpêtrière and 78 patients from the University
locked-in patients from UWS (Phillips et al., 2011). Hospital of Liège. We first show that different EEG config-
In this context, EEG is particularly interesting as this urations (sensor number, sensor position and numbers of
neurophysiological technique conveys rich temporal in- epochs) and experimental protocols (auditory stimulation
formation on cognitive operations and can be economic- or resting state) induce significant changes in the distribu-
ally operated in a wide range of situations, potentially tion and performance of the EEG markers. Yet, we found
enabling bedside or home assessment. The challenge of that the DOC-Forest is relatively immune to such variations
processing large amounts of EEG data at scale can now- by exploiting the information conveyed by reliable EEG
adays be addressed using automated EEG processing markers. We subsequently demonstrate out-of-sample gen-
methods (Engemann et al., 2015; Jas et al., 2017). eralization to two independent datasets: a new cohort of
However, preferences for cognitive theories and EEG 107 task-EEG recordings (not previously analysed) from
methodologies are heterogeneous across laboratories, Paris and 78 resting state EEG recordings from the
which significantly obstructs the development of large- University Hospital of Liège. Moreover, we show that
scale data resources well suited for high-fidelity machine our DOC-Forest’s generalization performance is decisively
learning. The emerging EEG markers, so far, fall into superior to univariate markers. Finally, by investigating the
four conceptual families. Evoked markers are based on influence of individual markers on the decisions of DOC-
time-locked event-related analysis of cognitive experi- Forest, we found that alpha-band power, theta-band con-
ments. The other families contain markers defined inde- nectivity and time series complexity carry complementary
pendently from protocols, including, connectivity markers information about states of consciousness.
exploiting brain–network interactions, information theory
markers capitalizing on information properties of time
series and spectral markers quantifying neuronal oscilla- Materials and methods
tions or stochastic band-limited dynamics. Yet, the situ-
ation is further complicated by the fact that DOC reflect Ethics statement
several cognitive and neurological components rather
than a single dimension, motivating the consideration of This research project was approved by the ethical committee of
marker profiles (Bayne et al., 2016; Sergent et al., 2017). the Pitié-Salpêtrière hospital under the code ‘Recherche en
In a recent study, using a support vector machine (SVM) soins courants’ (routine care research). All investigations
were carried out in accordance with the Declaration of
classifier, Sitt et al. (2014) analysed dozens of EEG mar-
Helsinki on ethical principles for medical research involving
kers from more than 150 high-density EEG recordings human subjects. For the dataset from the Coma Science
during an auditory novelty task. Interestingly, combin- Group, the family of the patient gave their informed consent
ations of markers synergistically outperformed single for participation in the study, and the Ethics Committee of the
markers. Similarly, using graph-theoretical summaries University hospital of Liège approved the study.
of alpha-band connectivity, Chennu et al. (2017) pre-
sented an alternative SVM approach cross-validated on
Participants
104 patients with severe brain injury (among those 89
with DOC). In total, 327 EEG recordings from 268 distinct patients from
Nevertheless, a generalized large-scale attempt for cross- our expert centres were included in the current study (Table 1).
laboratory predictions of state of consciousness in brain- Patients were assessed at variable delays (sub-acute or chronic
stage following the brain injury) in order to clarify the actual
injured patients is missing, and several practical questions
state of consciousness. Clinical assessments were performed at
remain unanswered: what is the optimal duration for indi-
least three times in the Paris dataset and five times in the
vidual EEG recordings? Which task should the patient Liège dataset, in all cases on different days by trained clin-
undergo, if any? How many sensors should be used, and icians (see ‘Acknowledgements’ section) and included system-
where should they be located? Can a single machine learn- atically the CRS-R. CRS-R scores range from 0 to 23 and
ing algorithm perform on data from different clinical cen- reflect the presence or absence of response on a set of hier-
tres? Do models based on current EEG markers achieve archically ordered items testing auditory, visual, motor,
4 | BRAIN 2018: Page 4 of 14 D. A. Engemann et al.

Table 1 Patient characteristics in the three datasets

Auditory local global task Resting state


Paris 1 Paris 2 Liège
n(EEG) 142 107 78
n(patients) 98 92 78
n(UWS) 75 52 21
n(MCS) 67 55 57

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


Gender ratio, male/female 2.06 1.93 1.26
Age, years, mean (SD) 46.5 (17.8) 45.4 (17.7) 38.0 (14.3)
Delay, days, mean (SD) 126.0 (372.9) 299.6 (823.6) 1040.6 (1227.6)
Delay, days, min–max 6–2611 8–6570 11–5380
Anoxia, % 29.6 30.4 21.7
Stroke, % 29.6 15.2 3.84
Traumatic brain injury, % 23.5 28.2 48.1
Other, % 18.4 29.4 21.8

oromotor, communication, and arousal functions (Giacino The markers commonly used in clinical neuroscience are
et al., 2004). According to the best assessment, each patient often defined at a general level and can be observed over mul-
was diagnosed with UWS or MCS. The data acquisition proto- tiple electrodes, time points or frequency bands. To delineate
col included, in all centres, multiple clinical assessments and at low-level features, we computed four summary statistics from
least one EEG recording. For some patients, several EEG each marker (Fig. 1). To summarize epochs, we either com-
recordings were available, which we later accounted for by puted the 80% trimmed mean, or the standard deviation (SD).
statistical modelling. The number of recordings varied consid- The sensor dimension was then summarized using a mean or
erably across datasets; however, the ratio of MCS to UWS the standard deviation, yielding four combinations in total
patients was roughly balanced. Across all datasets more male (Fig. 1A). Throughout the manuscript we refer to these
than female patients were observed. Age distributions were marker subtypes as ‘mean,mean’, ‘std,mean’, ‘mean,std’ and
similar; however, the delay from accident was visibly higher ‘std,std’ and in figures, for brevity, ‘m,m’, ‘s,m’, ‘m,s’, ‘s,s’.
for the resting state dataset. Likewise, the distribution of For a full list and abbreviations, see Table 2.
aetiologies was different for the resting state dataset while pro- Computation was carried out using a designated Python
portions were consistent with the literature. software library implementing the biomarker extraction func-
tionality from Sitt et al. (2014). The extracted markers closely
matched the original values and group results for the reference
Experimental paradigm datasets were qualitatively reproduced (Engemann et al.,
2015).
In the Paris 1 and 2 datasets, task-related EEG signals were
obtained using the ‘Local-Global’ protocol (Bekinschtein et al.,
2009) designed to study unconscious and conscious auditory Statistical analysis
processing. In the Liège dataset, EEG recordings were task-free
(see the online Supplementary material for details) Classification of disorders of consciousness from
EEG markers
Diagnosis was classified based on EEG markers using a uni-
Selection and computation of variate and a multivariate machine learning strategy. To enable
putative EEG markers of comparisons across studies, we also computed model-free per-
consciousness formance on single markers as in Sitt et al. (2014).
Performance was assessed using the area under the curve
We extracted 28 putative EEG biomarkers detailed in Sitt et al. (AUC). For details see Supplementary material ‘Area under
(2014). The markers can be grouped into four conceptual the curve metric’ section. For multivariate and univariate pat-
families, i.e. information theory, connectivity, spectral, and tern analysis, we chose the Extra-Trees algorithm (Geurts
evoked response markers (Table 2). Among several connectiv- et al., 2006) whose non-parametric design facilitates robust
ity metrics described in Sitt et al. (2014), we only considered classification. To complement insights from univariate classifi-
the weighted symbolic mutual information (wSMI) metric in cation, we extracted the so-called variable importance metric
theta frequency band as previous research had suggested that from the Extra-Trees following best practice recommendations
the long-range connectivity patterns theoretically related to for enhanced interpretability (Louppe et al., 2013; Louppe,
consciousness are most robustly and accurately assessed by 2014). Accordingly, our variable importance scores reflect
this metric (King et al., 2013a). Note that for the analysis of mutual information between a variable and the diagnosis
resting state EEG we did not make use of the evoked response while conditioning out the other variables. For background
markers as those are only defined for the task used in the Paris information on parameters and model tuning, see
datasets. For a detailed description and discussion of the mar- Supplementary material ‘Multivariate pattern classification’
kers, see Sitt et al. (2014). section. To use a common currency when comparing
Robust classification of disorders of consciousness from EEG BRAIN 2018: Page 5 of 14 | 5

Table 2 Potential EEG biomarkers of consciousness

Abbreviation Marker Conceptual family Protocol


PE Permutation entropy Information theory Task, rest
K Kolmogorov complexity Information theory
wSMI  Weighted symbolic mutual information Connectivity
a Alpha PSD Spectral
|a| Normalized alpha PSD Spectral

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


b Beta PSD Spectral
|b| Normalized beta PSD Spectral
d Delta PSD Spectral
|d| Normalized delta PSD Spectral
g Gamma PSD Spectral
|g| Normalized gamma PSD Spectral
y Theta PSD Spectral
|y| Normalized theta PSD Spectral
MSF Median power frequency Spectral
SE90 Spectral entropy 90 Spectral
SE95 Spectral entropy 95 Spectral
SE Spectral entropy Spectral
CNV Contingent negative variation Evoked Task
P1 Short-latency auditory potential to the first sound Evoked
P3a Mid-latency auditory potential to the first sound Evoked
P3b Mid-latency auditory potential to the first sound Evoked
GD–GS Full contrast Evoked
LD–LS Full contrast Evoked
LSGD–LDGS Full contrast Evoked
LSGS–LDGD Full contrast Evoked
MMN Contrasted MNN (local deviant versus local standard) Evoked
P3a Contrasted P3a (local deviant versus local standard) Evoked
P3b Contrasted P3b (global deviant versus global standard) Evoked

GD = global deviant; GS = global standard; LD = local deviant; LS = local standard; MMN = mismatch negativity; PSD = power spectral density.

univariate with multivariate marker performance, we turned Data availability


single markers into fully functional classification models by The clinical data used in this paper can be made available
using the identical recipe as for the DOC-Forest, effectively upon reasonable request, but because of the sensitive nature
only changing the features passed to the classifier. This of the clinical information concerning the patients the ethics
allowed us to predict the probability of DOC diagnosis from protocol does not allow open data sharing.
single markers using the same framework as for multivariate
analysis.

Statistical inference Results


We extended our visualizations into hypothesis tests by using
the percentile bootstrap (Efron and Tibshirani, 1993)
(Supplementary material). To assess out-of-sample generaliza-
Robust detection of state-of-
tion we used two complementary approaches: a conservative consciousness from EEG features
validation on independent data (new cohorts, different proto-
cols and laboratories) and cross-validation (Supplementary Multivariate classification of UWS versus MCS is
material). robust across EEG configurations
The DOC-Forest classifier exhibited an average perform-
Software ance of AUC = 0.75 (SD = 0.014) and performed better
All data were processed using the Python programming lan- and more robustly than most other markers did individu-
guage. To simplify preprocessing and feature extraction for ally (Fig. 2A, B, Supplementary Figs 1 and 2). Moreover, its
machine learning, we developed a designated software library
discrimination performance increased with the number of
(available at https://github.com/nice-tools/nice) built on top of
the open source software libraries MNE (Gramfort et al., sensors (rhoSpearman = 0.803, 95% CI: 0.646–0.891; P
2014) and scikit-learn (Pedregosa et al., 2011). The DOC- 50.001) and epochs (rhoSpearman = 0.40, 95% CI: 0.07–
Forest recipe is publicly available (https://github.com/nice- 0.668; P 50.05) (Fig. 2B) but was already strong with 16
tools/nice) to encourage community efforts in building predict- sensors and 5% of epochs. Importantly, using the full EEG
ive models of DOC patients’ state of consciousness. configuration, the performance closely resembled previous
6 | BRAIN 2018: Page 6 of 14 D. A. Engemann et al.

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


Figure 1 Extraction of EEG features. (A) The EEG markers fell into four conceptual families, i.e. spectral, information theory, connectivity
and evoked responses. When computing the markers from the preprocessed EEG, we obtained several observations for channels, epochs, time
points and frequency bins, depending on the family. Following Sitt et al. (2014), we extracted four features from each marker (indicated by the red
dots) by summarizing the observations systematically: we computed either the mean or the standard deviation first across epochs (1) and then
across sensors (2). If a third dimension was present (3), we summarized it using the mean. We, hence, referred to the ensuring four features as
‘mean,mean’, ‘mean,std’, ‘std,mean’ and ‘std,std’. (B) We repeated this process using six alternative sensor configurations (256,128, 64, 32, 16, 8)
and six alternative percentages of consecutive epochs (1, 5, 25, 50, 75, 100) with about seven epochs at 1% and about 700 epochs at 100%.
Sensors were selected such that they approximated realistic EEG caps respecting the international 10-20 system. Selection of epochs respected
the relative proportions of conditions used in the task. This allowed us to compute markers based on experimental contrasts at any point. This
yielded 36 alternative EEG configurations. D = deviant; freq. = frequency; G = global; L = local; S = standard; sens. = sensor; std = standard
deviation.

results reported by Sitt et al. (2014) and beat any other information between a marker and the diagnosis while con-
marker (Supplementary Fig. 2). These results suggest that trolling for the contribution of other markers. The variable
the DOC-Forest preferentially tracks information conveyed importance can deviate systematically from the univariate
by a few robust markers over a variety of EEG AUC whenever information is shared between markers or
configurations. the model has identified non-linear interaction effects.
Using the full configuration, we subsequently assessed the Inspecting all DOC-Forest classifiers for the 36 configura-
consistency of classification success for different aetiological tions, we observed that markers contributing most strongly
groups and different levels of chronicity (Supplementary on average belonged to different conceptual families (Fig.
material ‘Consistency of classification results in diagnostic 2C). Specifically, permutation entropy and long-range con-
subgroups’ section). Comparable results were obtained for nectivity in the theta band and the alpha frequency band
the chronic (delay 4 30 days) and acute (delay 4 30 days) power were top ranked in terms of univariate discrimination
groups. The classification performance was significant for and variable importance. In contrast, evoked markers, on
all the aetiology groups (i.e. anoxia, stroke and traumatic average, often assumed values below 0.89%, which is less
brain injury). Yet, in the case of traumatic brain injury than would be expected if all markers were equally influen-
patients the performance was slightly lower, suggesting tial. We observed a positive but non-linear relationship be-
that the heterogeneity of this group makes it more difficult tween average AUC and average variable importance
to classify. For additional fine-grained comparisons be- (rhoSpearman = 0.817, 95% CI: 0.727–0.880; P 50.001). It
tween single markers and the DOC-Forest, see can be seen that highly performing markers were dispropor-
Supplementary material ‘Detailed comparison between in- tionally more important than expected for a linear associ-
dividual markers and DOC-Forest’ section. ation (Fig. 2C).

Classification is preferentially driven by distinct Exploiting invariant EEG features of


theta- and alpha-band dynamics consciousness for generalization
While it is not convenient to reason separately about each of
the 2000 decision trees grown inside our DOC-Forest, we Generalization to independent data, protocols and
can still analyse the relative contributions of EEG markers to configurations
classification performance by considering the variable im- Here we considered two independent cohorts: 107 task-
portance. This multivariate metric approximates the mutual EEG recordings from the Paris Pitié-Salpêtrière Hospital
Robust classification of disorders of consciousness from EEG BRAIN 2018: Page 7 of 14 | 7

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


Figure 2 Performance of EEG markers of consciousness across different EEG configurations. (A) Performance distribution over
markers (grey: model-free in-sample performance; blue: cross-validation with univariate forests) and the multivariate DOC-Forest pattern
classifier (red) across 36 EEG configurations on the Paris 1 dataset. (B) DOC-Forest tended to improve as more epochs and sensors were used.
Although optimal performance was achieved with 128 electrodes, reasonable performance could still be obtained with only 16 electrodes and a
minimum of epochs. (C) Cross-validated univariate performance as a function of multivariate variable importance in the DOC-Forest, both
averaged across EEG-configurations. Marker subtype and conceptual family are indicated by shape and colour, respectively. A positive but non-
linear relationship emerged. The best univariate markers were disproportionally more important to the DOC-Forest as a linear relationship
would predict. It is noteworthy that markers from the spectral, connectivity and information theory families had the highest univariate per-
formance and were assigned the highest importance by the classifier while the evoked markers systematically less important. See also Table 2; m,m
= mean,mean; m,s = mean,std; PE = permutation entropy; sens. = sensor; s,m = std,mean; s,s = std,std.

(Paris 2) and 78 resting state EEG recordings by an inde- information theory and spectral families, which showed
pendent research group (Coma Science Group, Liège, the highest cross-validation performance on the training
Belgium; see Table 2 for an overview). When training the set. For Paris 1 (Fig. 3A) these were wSMI (mean,mean),
DOC-Forest on the Paris 1 dataset, and testing the algo- theta permutation entropy (mean,mean) and normalized
rithm on the Paris 2 dataset, each time using the full EEG alpha power (std,mean) with scores of 0.75, 0.74 and
configuration, we observed significant classification per- 0.77, respectively. For the combined Paris 1 and 2 dataset
formance with an AUC around 0.73 [standard deviation these were: theta wSMI (std,mean), theta permutation en-
(SD) = 0.05, 95% CI: 0.63–0.82] (Fig. 3A). Likewise, tropy (mean,mean) and alpha band power (mean,mean)
when trained on all available data from Paris (Paris 1 with cross-validated scores of 0.69, 0.69 and 0.73, respect-
and Paris 2) but ignoring the evoked markers (Table 1 ively. All univariate models showed lower generalization
and Fig. 1A), the DOC-Forest scored an AUC of 0.78 performance (0.04 to 0.14 AUC points) compared to the
(SD = 0.06, 95% CI: 0.66–0.89) on the Liège resting DOC-Forest and only the alpha band classifiers performed
state data (Fig. 3B). meaningfully better than a dummy classifier (Fig. 3,
We subsequently assessed generalization of our classifier middle). Comparing the variable importance to each mar-
trained on the Paris dataset to distinguish UWS versus ker’s out-of-sample performance, again, revealed posi-
MCS to a dataset of 66 conscious controls. The DOC- tive non-linear associations (Fig. 3A and B, right,
Forest classified 94% of the controls (Paris local-global rhoSpearman Paris 1!2 = 0.477, 95% CI: 0.312–0.620;
paradigm: 34 of 36, Liège resting state: 28 of 30) as P 5 0.001; rhoSpearman Paris!Liège = 0.521, 95% CI:
MCS. This result suggests that the patterns used by the 0.309–0.684; P 5 0.001). The display reveals that several
classifier to distinguish UWS versus MCS patients extrapo- univariate models showed reasonable generalization per-
late to normal controls. formance with AUC values beyond 0.70. Highly perform-
Furthermore, we detected two cognitive-motor dissoci- ing markers were disproportionally more important for the
ation patients in the Liège dataset. These patients were ori- DOC-Forest than would have been expected assuming a
ginally labelled as UWS from their behaviour but showed linear relationship.
evidence of conscious processing using an active functional Strikingly, generalization was even successful when dif-
MRI paradigm (see the Supplementary material for a brief ferent EEG configurations were combined, e.g. training
description of the two cases). Both cases were classified as with 100% of the epochs and 32 sensors and testing
MCS by DOC-Forest. with 50% of the epochs and eight sensors, although this
induced decodable differences between training and testing
Generalization using univariate markers sets (Supplementary Fig. 3). On average, the DOC-Forest
Less consistent results were obtained when using univariate performed significantly higher than any of the three corres-
forests based on the markers from the connectivity, ponding univariate forests (Table 3). Inspection of the
8 | BRAIN 2018: Page 8 of 14 D. A. Engemann et al.

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


Figure 3 Generalization between datasets and protocols. (A) Generalization from the Paris 1 cohort to 107 new EEG recordings from
Paris (task-EEG in both cases). Left: The ROC curves for the multivariate DOC-Forest and three univariate forests based on the feature that
performed best (cross-validation) on the training set corresponding to the connectivity, information and spectral families. Middle: Bootstrap
distributions of improvements over a dummy classifier based on paired differences, ordered by performance. Positive values indicate performance
better than the dummy model. Boxplot whiskers show the 95% CI. Right: The generalization performance of each marker against training-set
importance. The 10 most important features are labelled for convenience. (B) Generalization from 249 task-EEG (Paris 1 + Paris 2) to 78 resting
state EEG recordings (Liège) depicting an equivalent analysis as in A but not including the evoked response features. The results suggest
meaningful prospective generalization for the DOC-Forest while the univariate models were overall less successful. See also Table 2. m,m =
mean,mean; m,s = mean,std; PE = permutation entropy; sens. = sensor; s,m = std,mean; s,s = std,std.

cross-configuration generalization patterns revealed that the simulation forced the DOC-Forest to collapse and eventu-
performance changes were far from random, favouring spe- ally yield systematically wrong predictions. However, the
cific but distinct combinations of sensors and epochs for classifier still delivered reasonable predictions even if up
both generalization tasks (Supplementary Fig. 4). to 30% of the diagnostic labels were flipped. Moreover,
the literature would predict between 6% and 17% of mis-
Robustness to noise diagnoses (Wannez et al., 2017) for the three to five CRS-R
As the DOC-Forest seemed resilient to mismatching EEG repetitions used in this study and, here, fall into the range
configurations, we conducted a computational stress-test by of resilient generalization. These results demonstrate that
adding noise to the markers in the testing set until classifi- the DOC-Forest is not only relatively robust to noise in
cation broke down (Fig. 5A). Unsurprisingly, across gener- the data but also to noise in the diagnostic labels.
alization tasks, the univariate classifiers collapsed earlier at
signal-to-noise ratios (SNRs) between 1/10 and 1/100,
whereas the DOC-Forest endured longer, eventually failing
at SNR values of 1/1000. Another concern potentially lim-
Discussion
iting generalization performance is the quality of the diag- We evaluated the robustness to different EEG configur-
nostic information. We empirically assessed in a second ations and recording conditions of univariate and multi-
computational stress-test the stability of generalization variate pattern based on 28 putative EEG biomarkers of
from Paris to Liège in the face of increasingly inaccurate consciousness using the Extra-Trees algorithm. To the best
diagnostic training labels (Fig. 5B). By design, this of our knowledge, our study represents the most extensive
Robust classification of disorders of consciousness from EEG BRAIN 2018: Page 9 of 14 | 9

Table 3 Average generalization performance over different EEG configurations

Generalization Contrast Difference 95% CI


Paris 1 ! 2 DOC-Forest - wSMI  (m,m) D = 0.124*** 0.122–0.125
Paris 1 ! 2 DOC-Forest - PE  (m,m) D = 0.097*** 0.096–0.098
Paris 1 ! 2 DOC-Forest - |a|(s,m) D = 0.035*** 0.033–0.037
Paris ! Liège DOC-Forest - wSMI  (s,m) D = 0.140*** 0.139–0.142
Paris ! Liège DOC-Forest - PE  (m,m) D = 0.118*** 0.115–0.120

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


Paris ! Liège DOC-Forest - a (m,m) D = 0.035*** 0.034–0.037

***P 5 0.001.
See also Table 2. m,m = mean,mean; m,s = mean,std; PE = permutation entropy; s,m = std,mean.

Figure 4 Generalization between datasets and protocols when EEG configurations differ. (A) Generalization from Paris 1 to Paris 2
when 1296 different combinations of EEG configurations were used for training and testing (six sensors  six epoch configurations for each set).
The same univariate forest models as in Fig. 3 were considered next to the multivariate DOC-Forest. The distribution of AUC scores is indicated
by the histograms, single observations are indicated by the rug plot. The orange solid lines indicate the mean of the distribution, the orange dotted
line the performance when the reference configuration of 100% epochs and 256 sensors is used on both training and testing. (B) The same
analysis for the generalization from the joint Paris 1 and 2 dataset to the Liège dataset. It can be seen that, on average, the DOC-Forest
outperforms any of the univariate models. See also Table 2. m,m = mean,mean; m,s = mean,std; PE = permutation entropy; sens. = sensor; s,m =
std,mean; s,s = std,std.

validation of a machine learning approach to diagnose Moreover, we found the DOC-Forest to preferentially
UWS versus MCS patients for two reasons. Our findings base its predictions on diverse aspects of alpha and theta
are based on the currently largest EEG dataset of patients frequency band dynamics. Importantly, our results show
suffering from DOC, comprising 327 recordings. Second, in that EEG-markers of consciousness can be accessed equiva-
the context of DOC, the present study is the first to dem- lently from task and resting state EEG.
onstrate prospective generalization of multivariate pattern
classification between different centres, EEG configurations,
and protocols. We demonstrated that robust generalization Robust learning of UWS versus MCS
can be achieved despite non-trivial changes in the spatio- diagnosis from EEG markers of
temporal configuration of the EEG and that this general-
ization can be resistant to certain degree of uncertainty in
consciousness
the training labels (up to 20%). We showed that by relying Our results demonstrate that diagnosis of UWS versus
on a robust classification algorithm, meaningful generaliza- MCS patients can be robustly inferred from multivariate
tion could be achieved even if the performance of individ- pattern classification using a wide array of EEG configur-
ual markers varied systematically between datasets. While ations (Fig. 2A and B). This was also the case with a min-
certain EEG markers, i.e. alpha band power and its fluctu- imum of sensors (~16) and epochs (10–50) and even when
ations turned out to be useful as stand alone classifiers we EEG configurations differed on the training and testing
found that the advantage of multivariate over univariate data (Fig. 4, Supplementary Figs 3 and 4), e.g. when train-
classification was most striking when systematic differences ing on 10% of the epochs with eight sensors and testing on
between the training and testing sets were present. all epochs with 256 sensors. We observed that many
10 | BRAIN 2018: Page 10 of 14 D. A. Engemann et al.

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


Figure 5 Computational stress tests. (A) The generalization performance of the DOC Forest and three univariate models as signal-to-noise
ratio is gradually reduced on the testing set. The noise was generated independently from Gaussian distributions with mean and variance
parameters from each feature with 50 realizations, scaled by the signal-to-noise ratio parameter and added to the testing set, such that at 1/10 the
noise was 10 times stronger than the signal. The standard deviation of performance over realizations is indicated by the shaded areas. It can be
readily seen that the DOC-Forest survives longest while at the same time decreasing its performance more slowly than each of the three
univariate models. In general, univariate models did not survive a signal to noise ratio of 1/100 or smaller while the DOC-Forest still showed
meaningful generalization performance beyond such low SNR values. (B) We estimated the impact of misdiagnosis on generalization empirically by
flipping the diagnosis labels for an increasing percentage of patients (0 to 100 in steps of five). To avoid bias and estimate variability, we randomly
draw patients at each percentage level and repeated the process 50 times. The median generalization performance is depicted by the boxplots
(whiskers show the 2.5 and 97.5 percentiles) and the mean performance by the superimposed red circles. The performance at 0% and 100%
flipping is shown by the red circles. For convenience, the percentage of misdiagnoses predicted from the number of CRS-R assessments reported
by Wannez et al. (2017) is superimposed by the coloured dotted lines. It can be seen that the mean generalization performance drops more slowly
between 10 and 30% than between 30 and 50% and remains reasonable even if up to 30% of the diagnoses are flipped. PE = permutation entropy.

individual markers were highly variable (Fig. 2A, significant univariate classification as in Sitt et al. (2014)
Supplementary Figs 1 and 2). Nonetheless, our DOC- implies significant differences in a marker between the diag-
Forest fluctuated narrowly between AUC scores of 0.72 noses. The presence of univariate classification success and
and 0.77 (Fig. 2D). Inspection of our classifier in terms its positive correlation with multivariate variable import-
of the variable importance revealed a striking pattern ance suggests that, in the present study, more significant
(Fig. 2C and Supplementary Fig. 1B). Markers that were variables were more predictive while less predictive vari-
most influential for its classifications not only were the ones ables were less significant.
with the greatest individual discrimination performance,
but also turned out to be less susceptible to changes in
the EEG configuration, noise on the EEG features and Robust classification was driven by
noise in the diagnostic labels (Figs 4 and 5). Interestingly, distinct alpha and theta frequency
the overall relationship between univariate performance
and variable importance was not linear. As univariate
band dimensions
marker performance increased, marker importance Our findings suggested that protocol-general markers were,
increased disproportionally, i.e. at the top of the distribu- overall, more reliable. Strikingly, these markers, belonging to
tion, a change in univariate AUC lead to a bigger change in different conceptual families, were all related to neuronal
importance than at the bottom of the distribution. Our dynamics in the theta and alpha range (Figs 3 and 4). The
findings, therefore, suggest that our DOC-Forest provides robustness of these markers may be explained by the fact
robust learning of UWS versus MCS diagnosis by enhan- that no excessive averaging is needed for their extraction and
cing the impact of robust EEG markers. their characteristic EEG topographies are simple and easy to
In this context, it may be interesting to consider the re- capture with few sensors. However, the tight relationship
cently issued warning that predictive variables are not ne- between variable importance and conditional mutual infor-
cessarily the ones that differ significantly (Lo et al., 2015; mation (Louppe, 2014) suggests that these top performing
Bzdok et al., 2018). As the AUC can be regarded as a markers carry independent information. Indeed, recent re-
rescaled Mann-Whitney U-test (Supplementary material), search has suggested a rather complex picture of functional
Robust classification of disorders of consciousness from EEG BRAIN 2018: Page 11 of 14 | 11

and pathophysiological landscapes. The complexity of theta- been shown that cross-validation can give positively biased
band signals and their long-range interactions could reflect performance estimates (Saeb et al., 2017; Varoquaux et al.,
distinct memory processes underlying consciousness, such as 2016; Varoquaux, 2018; Woo et al., 2017). Beyond cross-
access and maintenance (Axmacher et al., 2010). Similarly, validation, here, we demonstrated significant, positive gen-
alpha-band power may reflect global arousal and demands eralization to independent EEG data from a different EEG
for dynamic inhibition required for functional encapsulation protocol recorded by an independent research group
of cortical networks (for an overview see Sadaghiani and (Fig. 4) and did not observe considerable deviations from
Kleinschmidt, 2016). Moreover, intact consciousness has cross-validation scores. Generalization from the Paris to the

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


been related to the peak frequency of alpha and theta Liège dataset even showed marginal improvements over
band oscillations originating from distinct cerebral gener- cross-validation. As noted previously, this could not be ex-
ators (Schiff, 2010; Williams et al., 2013). In fact, the meso- plained by the absence of evoked markers. Precluding the
circuit model predicts that the downregulation of the possibility of random selection bias, this may suggest that
thalamo-cortical circuits following a brain injury should be either the signal quality or the diagnostic information may
directly associated to changes in the interactions within these have been more favourable on the Liège data. Interestingly,
frequency bands observed in this study (Victor et al., 2011; compared to the best markers, i.e. alpha band power and
Schiff et al., 2014). Yet, this is further complicated by the its fluctuations, the advantage of the DOC-Forest was only
fact that these generators can be selectively disrupted for marginal by a few AUC points. In contrast, the other re-
different aetiologies and can show a variety of regional ef- maining univariate models (based on theta band permuta-
fects during anaesthesia (Purdon et al., 2013). While future tion entropy and theta wSMI) did not generalize
experimental research is desirable to disentangle these facets, significantly. Thus, our findings demonstrate that single
our findings suggest that the presence of independent physio- markers can yield reasonable stand-alone classifiers but
logical sources of information may enhance generalization as also expose the difficulty of anticipating which marker
it is unlikely that all of their measurements will be corrupted will actually succeed. Fortunately, MVPA potentially
at the same time on new data. solves this selection problem with greater success by learn-
But do our results imply that less important variables were ing predictive profiles of markers. Indeed, we observed that
useless? Not necessarily. Many evoked markers enjoy a high DOC-Forest was more robust than individual markers
degree of neuroscientific validation and intuitively support when using different combinations of EEG configurations
clinical reasoning. The P3 markers, for example, belong to for training and testing. Likewise, we observed that univari-
the most studied indices of consciousness in the EEG litera- ate classifiers collapsed earlier and faster than the DOC-
ture and are commonly used in brain computer interfaces Forest as we experimentally corrupted the training data
settings (Lulé et al., 2013). They have been related to pro- (Fig. 5).
cessing novelty in bottom-up information, the global neur- The significant generalization from task to resting state
onal workspace, access consciousness, and context-updating EEG deserves separate consideration. It is conceivable that
(Donchin and Coles, 1988; Pins, 2003; Sergent et al., 2005; EEG markers related to the so-called functional axis of
Dehaene et al., 2006; Polich, 2007). Considering such mar- consciousness (Sergent et al., 2017), are accessible during
kers for MVPA may, thus, improve interpretability. task and resting state EEG. Accordingly, changing states of
Additionally, evoked markers indexing auditory novelty consciousness should impact markers of global house-keep-
have been shown to be rather specific than sensitive (King ing functions such as alpha band power, global long-range
et al., 2013b). Likewise, it could be the case that candidate connectivity or signal complexity, irrespective of the con-
markers of conscious access, e.g. P3b, may be more relevant text. For instance, for a patient with locked-in syndrome
to distinguish MCS+ from MCS– patients (Naccache, 2018). we observed EEG patterns similar to healthy persons
Although being de-emphasized by the DOC-Forest, evoked during rest (Rohaut et al., 2017) and here we also demon-
markers may still have contributed positively. Indeed, strate the discrimination of two cognitive motor dissoci-
excluding all evoked markers from the Paris 1 to Paris 2 ation patients from UWS patients from their resting state
generalization actually reduced DOC-Forest performance EEG. This can be explained by that fact that we observed
marginally (AUC = 0.71, 95% CI: 0.618–0.807, SD = significant generalization from task to resting state EEG by
0.049). One could, therefore, argue that, evoked markers several EEG makers, principally for alpha band power
should be considered for MVPA of DOC whenever avail- (Fig. 3B, right).
able, alongside a few robust markers.
Practical implications and
EEG markers of consciousness are suggestions
shared between protocols and
How long should EEG recordings be to yield a useful
contexts feature space for machine learning?
In the field of clinical neuroscience, cross-validation is com- Our results suggest that reasonable results can be achieved
monly used to assess MVPA performance. However, it has with a short duration EEG recording (30 s to 3 min). This
12 | BRAIN 2018: Page 12 of 14 D. A. Engemann et al.

potentially broadens the scope of protocols usable in prac-


tice and encourages development of fast, time-resolved, eco- Conclusion
nomic screening tasks. In the current study, we demonstrate that electrophysiolo-
gical markers of consciousness can be robustly exploited
How many EEG sensors should be used? across contexts and protocols by relying on robust machine
When high-density nets are available, using the full config- learning techniques. In this context, the proposed feature-
urations turns out to be beneficial for model fitting. extraction method based on multiple summary statistics
was particularly useful as it permits one to abstract away

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


However, results based on 16 sensors from a 10-20 mon-
tage scheme are already encouraging. As a consequence, specific sensor layouts, recording protocols and local EEG
this supports the idea that data can be successfully methodologies. Future work will have to demonstrate if the
pooled over various EEG systems even when the number here-proposed ‘robust tool for detecting state-of-conscious-
of electrodes differs. ness in brain-injured patients’ can be extended to a ‘robust
neurophysiological marker of conscious state’. It will have
to be demonstrated that the proposed model can generalize
Which EEG protocol should be used? to other loss of consciousness scenarios, such as sleep or
Both univariate and multivariate analysis suggested that anaesthesia. We wish that our findings and our publicly
EEG markers of consciousness are accessible using task released strategy for classification will contribute to build-
and resting state data. This suggests that protocols can be ing large datasets that could eventually enable intensely
liberally combined in clinical practice and encourages the data-driven, cross-centre approaches to treatment of se-
development of simpler and faster screening routines as verely brain-injured patients and understanding the
compared to a full-blown cognitive experiment encompass- neural-underpinnings of conscious processing.
ing hundreds of trials.

Can classification models generalize to data from Acknowledgements


other sites?
We thank Charlene Aubinet, Olivier Bodart, Manon
Our findings demonstrate prospective generalization to new
Carrier, Athena Demertzi, Charlotte Martial and Sarah
data from younger cohorts and data from other research Wannez for their contributions with to clinical evaluation
laboratories. The use of robust methods is particularly rec- of the patients. We would like to thank the UNICOG and
ommended to alleviate problem of changing marker distri- Parietal team at Neurospin for repeated fruitful and stimu-
butions between datasets. lating discussion on this research project. We would also
like to express our gratitude to the MNE community. The
When should multivariate analysis be preferred to current study would not have been possible without our
predict diagnosis? collaborative software development efforts. We specifically
Multivariate classification is more resilient to changes of acknowledge helpful discussions and comments on this
marker distributions across datasets, be it because study by Alexandre Gramfort, Benjamin de Haas, Danilo
of noise in the signals or in the training labels, differences Bzdok, Johan Stender, Stefania de Vito and Virginie van
of populations or differences in EEG configurations Wassenhove (alphabetical order). This study is dedicated to
and protocols. Beyond optimizing accuracy, multivariate the patients and to their close relatives.
classification models therefore yield more dependable clas-
sification performance.
Funding
How to extract biological insight from machine This work was supported by an ERC proof of concept
learning models grant issued to S.D., Institut National de la Santé et de la
Here we demonstrate how the careful inspection of multi- Recherche Médicale (France), the James S. McDonnell
variate variable importance scores supplements the univari- Foundation, the Institut du Cerveau et de la Moelle
ate analysis in qualifying interdependencies between EEG Épinière (France) to L.N., Consejo Nacional de
markers. While such insight may also be obtained from Investigaciones Cientı́ficas y Técnicas (Argentina), the
model coefficients of linear models, the variable importance FRM Equipe 2015 grant to L.N., STIC-AmSud grants
metric as used in this study is not limited to linear relation- Complexity as a neural marker: applications to EEG and
ships and does not necessitate explicit definition of non- natural language processing and RTBRAIN - Towards
linear effects or interaction effects. Real-time processing of brain signals, the Belgian Funds
Besides these specific points, we want to emphasize that for Scientific Research (FRS-FNRS), the European
we did not find one single globally best biomarker and that Commission, the European Union’s Horizon 2020
using machine learning tools to robustly combine theoret- Framework Programme for Research and Innovation
ically heterogeneous markers is the recommended strategy. under the Specific Grant Agreement No. 720270 (Human
Robust classification of disorders of consciousness from EEG BRAIN 2018: Page 13 of 14 | 13

Brain Project SGA1) and No. 785907 (Human Brain Dehaene S, Naccache L. Towards a cognitive neuroscience of con-
sciousness: basic evidence and a workspace framework. Cognition
Project SGA2), the Luminous project (EU-H2020-fetopen-
2001; 79: 1–37.
ga686764), the Center-tbi, the European Space Agency, Demertzi A, Antonopoulos G, Heine L, Voss HU, Crone JS, De Los
Belspo, ‘Fondazione Europea di Ricerca Biomedica’, the Angeles C, et al. Intrinsic functional connectivity differentiates min-
BIAL Foundation, Wallonia-Brussels Federation Concerted imally conscious from unresponsive patients. Brain 2015; 138:
Research Action and the Mind Science Foundation. S.D. 2619–31.
Demertzi A, Gomez F, Crone JS, Vanhaudenhuyse A, Tshibanda L,
gratefully acknowledges additional support from CIFAR.
Noirhomme Q, et al. Multiple fMRI system-level baseline connect-
D.E. gratefully acknowledges support by the INRIA start- ivity is disrupted in patients with consciousness alterations. Cortex

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


ing researcher grant 2016, the Amazon Web Services 2014; 52: 35–46.
research grant, and by the ERCYStG-263584 during Donchin E, Coles MGH. Is the P300 component a manifestation of
2015–16 issued to Virginie van Wassenhove. O.G. is context updating. Behav Brain Sci 1988; 11: 357–427.
Efron B, Tibshirani R. An introduction to the bootstrap New York,
post-doctoral fellow and S.L. is research director at FRS- NY: Chapman & Hall; 1993.
FNRS. Emmons WH, Simon CW. EEG, consciousness, and sleep. Science
1956; 124: 1066–9.
Engemann D, Raimondo F, King J-R, Jas M, Gramfort A, Dehaene S,
et al. Automated measurement and prediction of consciousness in
Supplementary material vegetative and minimally conscious patients. In: ICML workshop on
statistics, machine learning and neuroscience 2015. Lille, France;
Supplementary material is available at Brain online. 2015.
Faugeras F, Rohaut B, Valente M, Sitt J, Demeret S, Bolgert F, et al.
Survival and consciousness recovery are better in the minimally con-
scious state than in the vegetative state. Brain Inj 2018; 32: 72–7.
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine
References Learning 2006; 63: 3. Springer/Kluwer Academic Publishers.
Giacino JT, Ashwal S, Childs N, Cranford R, Jennett B, Katz DI, et al.
Axmacher N, Henseler MM, Jensen O, Weinreich I, Elger CE, Fell J.
The minimally conscious state definition and diagnostic criteria.
Cross-frequency coupling supports multi-item working memory in
Neurology 2002; 58: 349–53.
the human hippocampus. Proc Natl Acad Sci USA 2010; 107:
Giacino JT, Kalmar K, Whyte J. The JFK Coma Recovery Scale-
3228–33.
Revised: Measurement characteristics and diagnostic utility11No
Bayne T, Hohwy J, Owen AM. Are there levels of consciousness?
commercial party having a direct financial interest in the results of
Trends Cogn Sci 2016; 20: 405–13.
the research supporting this article has or will confer a benefit upon
Bekinschtein TA, Dehaene S, Rohaut B, Tadel F, Cohen L, Naccache
the authors or upon. Arch Phys Med Rehabil 2004; 85: 2020–9.
L. Neural signature of the conscious processing of auditory regula- Goldfine AM, Victor JD, Conte MM, Bardin JC, Schiff ND.
rities. Proc Natl Acad Sci USA 2009; 106: 1672–7. Determination of awareness in patients with severe brain injury
Bruno MA, Vanhaudenhuyse A, Thibaut A, Moonen G, Laureys S. using EEG power spectral analysis. Clin Neurophysiol 2011; 122:
From unresponsive wakefulness to minimally conscious PLUS and 2157–68.
functional locked-in syndromes: recent advances in our understand- Gosseries O, Zasler ND, Laureys S. Recent advances in disorders of
ing of disorders of consciousness. J Neurol 2011; 258: 1373–84. consciousness: focus on the diagnosis. Brain Inj 2014; 28: 1141–50.
Bzdok D, Engemann D-A, Grisel O, Varoquaux G, Thirion B. Gramfort A, Luessi M, Larson E, Engemann D, Strohmeier D,
Prediction and inference diverge in biomedicine: simulations and Brodbeck C, et al. MNE software for processing MEG and EEG
real-world data. bioRxiv 2018. doi: 10.1101/327437. data. Neuroimage 2014; 86: 446–60.
Casali AG, Gosseries O, Rosanova M, Boly M, Sarasso S, Casali KR, Iotzov I, Fidali BC, Petroni A, Conte MM, Schiff ND, Parra LC.
et al. A theoretically based index of consciousness independent of Divergent neural responses to narrative speech in disorders of con-
sensory processing and behavior. Sci Transl Med 2013; 5: sciousness. Ann Clin Transl Neurol 2017; 4: 784–92.
198ra105. Jas M, Engemann DA, Bekhti Y, Raimondo F, Gramfort A.
Chang HY, Nuyten DSA, Sneddon JB, Hastie T, Tibshirani R, Sørlie Autoreject: automated artifact rejection for MEG and EEG data.
T, et al. Robustness, scalability, and integration of a wound-re- Neuroimage 2017; 159: 417–29.
sponse gene expression signature in predicting breast cancer sur- Jennett B, Plum F. Persistent vegetative state after brain damage: a
vival. Proc Natl Acad Sci USA 2005; 102: 3738–43. syndrome in search of a name. Lancet 1972; 299: 734–7.
Chennu S, Annen J, Wannez S, Thibaut A, Chatelle C, Cassol H, et al. King J-R, Sitt JD, Faugeras F, Rohaut B, Karoui I El, Cohen L, et al.
Brain networks predict metabolism, diagnosis and prognosis at the Information sharing in the brain indexes consciousness in noncom-
bedside in disorders of consciousness. Brain 2017; 140: 2120–32. municative patients. Curr Biol 2013a; 23: 1914–19.
Claassen J, Velazquez A, Meyers E, Witsch J, Falo MC, Park S, et al. King JR, Faugeras F, Gramfort A, Schurger A, El Karoui I, Sitt JD,
Bedside quantitative electroencephalography improves assessment of et al. Single-trial decoding of auditory novelty responses facilitates
consciousness in comatose subarachnoid hemorrhage patients. Ann the detection of residual consciousness. Neuroimage 2013b; 83:
Neurol 2016; 80: 541–53. 726–38.
Cruse D, Chennu S, Chatelle C, Bekinschtein TA, Fernández-Espejo D, Laureys S, Celesia GG, Cohadon F, Lavrijsen J, José L-C, Sannita WG,
Pickard JD, et al. Bedside detection of awareness in the vegetative et al. Unresponsive wakefulness syndrome: a new name for the vege-
state: a cohort study. Lancet 2012; 378: 2088–94. tative state or apallic syndrome. BMC Med 2010; 8: 68.
Curley WH, Forgacs PB, Voss HU, Conte MM, Schiff ND. Lo A, Chernoff H, Zheng T, Lo S-H. Why significant variables aren’t
Characterization of EEG signals revealing covert cognition in the automatically good predictors. Pro. Natl Acad Sci USA 2015; 112:
injured brain. Brain 2018; 141: 1404–21. 13892–7.
Dehaene S, Changeux J-P, Naccache L, Sackur J, Sergent C. Louppe G, Wehenkel L, Sutera A, Geurts P. Understanding variable
Conscious, preconscious, and subliminal processing: a testable tax- importances in forests of randomized trees. In: Burges CJC, Bottou
onomy. Trends Cogn Sci 2006; 10: 204–11. L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in
14 | BRAIN 2018: Page 14 of 14 D. A. Engemann et al.

neural information processing systems 26 (NIPS). Lake Tahoe: Saeb S, Lonini L, Jayaraman A, Mohr DC, Kording KP. The need to
Curran Associates, Inc., 2013; p. 431–439. approximate the use-case in clinical machine learning. Gigascience
Louppe G. Understanding random forests: from theory to practice. 2017; 6: 1–9.
PhD thesis. University of Liège, Faculty of Applied Sciences, Schiff ND. Recovery of consciousness after brain injury: a mesocircuit
Department of Electrical Engineering & Computer Science, 2014. hypothesis. Trends Neurosci 2010; 33: 1–9.
Luauté J, Maucort-Boulch D, Tell L, Quelard F, Sarraf T, Iwaz J, et al. Schiff ND. Cognitive motor dissociation following severe brain inju-
Long-term outcomes of chronic minimally conscious and vegetative ries. JAMA Neurol 2015; 72: 1413–15.
states. Neurology 2010; 75: 246–52. Schiff ND, Nauvel T, Victor JD. Large-scale brain dynamics in dis-
Lulé D, Noirhomme Q, Kleih SC, Chatelle C, Halder S, Demertzi A, orders of consciousness. Curr Opin Neurobiol 2014; 25: 7–14.
Schnakers C, Vanhaudenhuyse A, Giacino J, Ventura M, Boly M,

Downloaded from https://academic.oup.com/brain/advance-article-abstract/doi/10.1093/brain/awy251/5114404 by guest on 05 October 2018


et al. Probing command following in patients with disorders of con-
sciousness using a brain–computer interface. Clin Neurophysiol Majerus S, et al. Diagnostic accuracy of the vegetative and minim-
2013; 124: 101–6. ally conscious state: clinical consensus versus standardized neurobe-
Monti MM, Vanhaudenhuyse A, Coleman MR, Boly M, Pickard JD, havioral assessment. BMC Neurol 2009; 9: 35.
Tshibanda L, et al. Willful modulation of brain activity in disorders Sergent C, Baillet S, Dehaene S. Timing of the brain events underlying
of consciousness. N Engl J.Med 2010; 362: 579–89. access to consciousness during the attentional blink. Nat Neurosci
Naccache L. Minimally conscious state or cortically mediated state? 2005; 8: 1391–400.
Brain 2018; 141: 949–60. Sergent C, Faugeras F, Rohaut B, Perrin F, Valente M, Tallon-Baudry
Naci L, Monti MM, Cruse D, Kübler A, Sorger B, Goebel R, et al. C, et al. Multidimensional cognitive evaluation of patients with dis-
Brain-computer interfaces for communication with nonresponsive orders of consciousness using EEG: a proof of concept study.
patients. Ann Neurol 2012; 72: 312–23. Neuroimage Clin 2017; 13: 455–69.
Owen AM, Coleman MR, Boly M, Davis MH, Laureys S, Pickard JD. Sitt JD, King J-R, El Karoui I, Rohaut B, Faugeras F, Gramfort A,
Detecting awareness in the vegetative state. Science 2006; 313: 1402. et al. Large scale screening of neural signatures of consciousness in
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel patients in a vegetative or minimally conscious state. Brain 2014;
O, et al. Scikit-learn: machine learning in python. J Mach Learn Res 137: 2258–70.
Stender J, Gosseries O, Bruno M-A, Charland-Verville V,
2011; 12: 2825–30.
Phillips CL, Bruno M-A, Maquet P, Boly M, Noirhomme Q, Vanhaudenhuyse A, Demertzi A, et al. Diagnostic precision of
PET imaging and functional MRI in disorders of consciousness: a
Schnakers C, et al. “Relevance vector machine’’ consciousness clas-
clinical validation study. Lancet 2014; 384: 514–22.
sifier applied to cerebral metabolism of vegetative and locked-in
Tononi G, Edelman GM. Consciousness and complexity. Science
patients. Neuroimage 2011; 56: 797–808.
1998; 282: 1846–51.
Pins D. The neural correlates of conscious vision. Cereb Cortex 2003;
Varoquaux G. Cross-validation failure: small sample sizes lead to large
13: 461–74.
error bars. Neuroimage 2018; 180: 68–77.
Polich J. Updating P300: an integrative theory of P3a and P3b. Clin
Varoquaux G, Raamana PR, Engemann DA, Hoyos-Idrobo A,
Neurophysiol 2007; 118: 2128–48.
Schwartz Y, Thirion B. Assessing and tuning brain decoders:
Purdon PL, Pierce ET, Mukamel EA, Prerau MJ, Walsh JL, Wong
cross-validation, caveats, and guidelines. Neuroimage 2016
KFK, et al. Electroencephalogram signatures of loss and recovery
Victor JD, Drover JD, Conte MM, Schiff ND. Mean-field modeling of
of consciousness from propofol. Proc Natl Acad Sci USA 2013;
thalamocortical dynamics and a model-driven approach to EEG
110: E1142–51.
analysis. Proc Natl Acad Sci USA 2011; 108 (Suppl 3): 15631–8.
Rohaut B, Claassen J. Decision making in perceived devastating brain Wannez S, Heine L, Thonnard M, Gosseries O, Laureys S; Coma
injury: a call to explore the impact of cognitive biases. Br J Anaesth Science Group Collaborators. The repetition of behavioral assess-
2018; 120: 5–9. ments in diagnosis of disorders of consciousness. Ann Neurol
Rohaut B, Raimondo F, Galanaud D, Valente M, Sitt JD, Naccache L. 2017; 81: 883–9.
Probing consciousness in a sensory-disconnected paralyzed patient. Williams ST, Conte MM, Goldfine AM, Noirhomme Q, Gosseries O,
Brain Inj 2017; 31: 1398–403. Thonnard M, et al. Common resting brain dynamics indicate a pos-
Rosenberg GA, Johnson SF, Brenner RP. Recovery of cognition after sible mechanism underlying zolpidem response in severe brain
prolonged vegetative state. Ann Neurol 1977; 2: 167–8. injury. Elife 2013; 2: e01157.
Sadaghiani S, Kleinschmidt A. Brain networks and/-oscillations: struc- Woo C-W, Chang LJ, Lindquist MA, Wager TD. Building better bio-
tural and functional foundations of cognitive control. Trends Cogn markers: brain models in translational neuroimaging. Nat Neurosci
Sci 2016; 20: 805–17. 2017; 20: 365–77.

You might also like