0% found this document useful (0 votes)

80 views19 pages

Evaluating DoRothEA in scRNA-seq Analysis

Doc

Uploaded by

marcwadsworth2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views19 pages

Evaluating DoRothEA in scRNA-seq Analysis

Doc

Uploaded by

marcwadsworth2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Holland et al.

Genome Biology (2020) 21:36

[Link]

RESEARCH Open Access

Robustness and applicability of

transcription factor and pathway analysis
tools on single-cell RNA-seq data
Christian H. Holland1,2, Jovan Tanevski1,3, Javier Perales-Patón1, Jan Gleixner4,5, Manu P. Kumar6, Elisabetta Mereu7,
Brian A. Joughin6,8, Oliver Stegle4,5,9, Douglas A. Lauffenburger6, Holger Heyn7,10, Bence Szalai11 and
Julio Saez-Rodriguez1,2*

Abstract
Background: Many functional analysis tools have been developed to extract functional and mechanistic insight
from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible
to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low
library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be
applied to scRNA-seq in a meaningful way.
Results: To address this question, we perform benchmark studies on simulated and real scRNA-seq data. We
include the bulk-RNA tools PROGENy, GO enrichment, and DoRothEA that estimate pathway and transcription
factor (TF) activities, respectively, and compare them against the tools SCENIC/AUCell and metaVIPER, designed for
scRNA-seq. For the in silico study, we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments.
We complement the simulated data with real scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks
on simulated and real data reveal comparable performance to the original bulk data. Additionally, we show that the
TF and pathway activities preserve cell type-specific variability by analyzing a mixture sample sequenced with 13
scRNA-seq protocols. We also provide the benchmark data for further use by the community.
Conclusions: Our analyses suggest that bulk-based functional analysis tools that use manually curated footprint
gene sets can be applied to scRNA-seq data, partially outperforming dedicated single-cell tools. Furthermore, we
find that the performance of functional analysis tools is more sensitive to the gene sets than to the statistic used.
Keywords: scRNA-seq, Functional analysis, Transcription factor analysis, Pathway analysis, Benchmark

Background molecular processes such as the activity of pathways or

Gene expression profiles provide a blueprint of the sta- transcription factors (TFs). These functional analysis
tus of cells. Thanks to diverse high-throughput tech- tools are broadly used and belong to the standard toolkit
niques, such as microarrays and RNA-seq, expression to analyze expression data [1–4].
profiles can be collected relatively easily and are hence Functional analysis tools typically combine prior
very common. To extract functional and mechanistic in- knowledge with a statistical method to gain functional
formation from these profiles, many tools have been de- and mechanistic insights from omics data. In the case of
veloped that can, for example, estimate the status of transcriptomics, prior knowledge is typically rendered as
gene sets containing genes belonging to, e.g., the same
* Correspondence: [Link]@[Link] biological process or to the same Gene Ontology (GO)
1
Institute for Computational Biomedicine, Bioquant, Heidelberg University, annotation. The Molecular Signature Database
Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, (MSigDB) is one of the largest collections of curated and
Germany
2
Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH annotated gene sets [5]. Statistical methods are as abun-
Aachen University, Faculty of Medicine, Aachen, Germany dant as the different types of gene sets. Among them,
Full list of author information is available at the end of the article

© The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License ([Link] which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
([Link] applies to the data made available in this article, unless otherwise stated.
Holland et al. Genome Biology (2020) 21:36 Page 2 of 19

the most commonly used are over-representation ana- on the same statistical method but relies on multiple
lysis (ORA) [6] and Gene Set Enrichment Analysis GRNs such as tissue-specific networks.
(GSEA) [7]. Still, there is a growing number of statistical We first benchmarked the tools on simulated single-
methods spanning from simple linear models to cell transcriptome profiles. We found that on this in
advanced machine learning methods [8, 9]. silico data the footprint-based gene sets from DoRothEA
Recent technological advances in single-cell RNA-seq and PROGENy can functionally characterize simulated
(scRNA-seq) enable the profiling of gene expression at single cells. We observed that the performance of the
the individual cell level [10]. Multiple technologies and different tools is dependent on the used statistical
protocols have been developed, and they have experi- method and properties of the data, such as library size.
enced a dramatic improvement over recent years. How- We then used real scRNA-seq data upon CRISPR-
ever, single-cell data sets have a number of limitations mediated knock-out/knock-down of TFs [20, 21] to as-
and biases, including low library size and drop-outs. sess the performance of TF analysis tools. The results of
Bulk RNA-seq tools that focus on cell type identification this benchmark further supported our finding that TF
and characterization as well as on inferring regulatory analysis tools can provide accurate mechanistic insights
networks can be readily applied to scRNA-seq data [11]. into single cells. Finally, we demonstrated the utility of
This suggests that functional analysis tools should in the tools for pathway and TF activity estimation on re-
principle be applicable to scRNA-seq data as well. How- cently published data profiling a complex sample with
ever, it has not been investigated yet whether these limi- 13 different scRNA-seq technologies [22]. Here, we
tations could distort and confound the results, rendering showed that summarizing gene expression into TF and
the tools not applicable to single-cell data. pathway activities preserves cell-type-specific informa-
In this paper, we benchmarked the robustness and tion and leads to biologically interpretable results. Col-
applicability of various TF and pathway analysis tools lectively, our results suggest that the bulk- and
on simulated and real scRNA-seq data. We focused footprint-based TF and pathway analysis tools DoRo-
on three tools for bulk and three tools for scRNA-seq thEA and PROGENy partially outperform the single-cell
data. The bulk tools were PROGENy [12], DoRothEA tools SCENIC, AUCell, and metaVIPER. Although on
[13], and classical GO enrichment analysis, combining scRNA-seq data DoRothEA and PROGENy were less ac-
GO gene sets [14] with GSEA. PROGENy estimates curate than on bulk RNA-seq, we were still able to ex-
the activity of 14 signaling pathways by combining tract relevant functional insight from scRNA-seq data.
corresponding gene sets with a linear model. DoRo-
thEA is a collection of resources of TF’s targets (reg- Results
ulons) that can serve as gene sets for TF activity Robustness of bulk-based TF and pathway analysis tools
inference. For this study, we coupled DoRothEA with against low gene coverage
the method VIPER [15] as it incorporates the mode Single-cell RNA-seq profiling is hampered by low gene
of regulation of each TF-target interaction. Both coverage due to drop-out events [23]. In our first ana-
PROGENy’s and DoRothEA’s gene sets are based on lysis, we focused solely on the low gene coverage aspect
observing the transcriptomic consequences (the “foot- and whether tools designed for bulk RNA-seq can deal
print”) of the processes of interest rather than the with it. Specifically, we aimed to explore how DoRo-
genes composing the process as gene sets [16]. This thEA, PROGENy, and GO gene sets combined with
approach has been shown to be more accurate and GSEA (GO-GSEA) can handle low gene coverage in gen-
informative in inferring the process’s activity [12, 17]. eral, independently of other technical artifacts and char-
The tools specifically designed for application on acteristics from scRNA-seq protocols. Thus, we
scRNA-seq data that we considered are SCENIC/ conducted this benchmark using bulk transcriptome
AUCell [18] and metaVIPER [19]. SCENIC is a com- benchmark data. In these studies, single TFs and path-
putational workflow that comprises the construction ways are perturbed experimentally, and the transcrip-
of gene regulatory networks (GRNs) from scRNA-seq tome profile is measured before and after the
data that are subsequently interrogated to infer TF perturbation. These experiments can be used to bench-
activity with the statistical method AUCell. In mark tools for TF/pathway activity estimation, as they
addition, we coupled AUCell with the footprint-based should estimate correctly the change in the perturbed
gene sets from DoRothEA and PROGENy that we TF or pathway. The use of these datasets allowed us to
hereafter refer to as D-AUCell and P-AUCell. Using systematically control the gene coverage (see the
DoRothEA with both VIPER and AUCell on scRNA- “Methods” section). The workflow consisted of four
seq for TF activity inference allowed us to compare steps (Additional file 1: Figure S1a). In the first step, we
the underlying statistical methods more objectively. summarized all perturbation experiments into a matrix
metaVIPER is an extension of VIPER which is based of contrasts (with genes in rows and contrasts in
Holland et al. Genome Biology (2020) 21:36 Page 3 of 19

columns) by differential gene expression analysis. Subse- compute pathway activities by default, as it has been
quently, we randomly replaced, independently for each shown that this leads to the best performance on bulk
contrast, logFC values with 0 so that we obtain a prede- samples [12]. However, one can extend the footprint size
fined number of “covered” genes with a logFC unequal to cover more genes of the expression profiles. We rea-
to zero. Accordingly, a gene with a logFC equal to 0 was soned that this might counteract low gene coverage and
considered as missing/not covered. Then, we applied implemented accordingly different PROGENy versions
DoRothEA, PROGENy, and GO-GSEA to the contrast (see the “Methods” section). With the default PROGENy
matrix, subsetted only to those experiments which are version (100 footprint genes per pathway), we observed
suitable for the corresponding tool: TF perturbation for a clear drop in the global performance with decreasing
DoRothEA and pathway perturbation for PROGENy and gene coverage, even though less drastic than for DoRo-
GO-GSEA. We finally evaluate the global performance thEA (from AUROC of 0.724 to 0.636, Fig. 1b, similar
of the methods with receiver operating characteristic trends with AUPRC, Additional file 1: Figure S1c). As
(ROC) and precision-recall (PR) curves (see the expected, PROGENy performed the best with 100 foot-
“Methods” section). This process was repeated 25 times print genes per pathway when there is complete gene
to account for stochasticity effects during inserting zeros coverage. The performance differences between the vari-
in the contrast matrix (see the “Methods” section). ous PROGENy versions shrank with decreasing gene
DoRothEA’s TFs are accompanied by an empirical coverage. This suggests that increasing the number of
confidence level indicating the confidence in their regu- footprint genes can help to counteract low gene cover-
lons, ranging from A (most confident) to E (less age. To provide a fair comparison between PROGENy
confident; see the “Methods” section). For this bench- and GO-GSEA, we used only those 14 GO terms that
mark, we included only TFs with confidence levels A match the 14 PROGENy pathways (Additional file 1:
and B (denoted as DoRothEA (AB)) as this combination Figure S1d). In general, GO-GSEA showed weaker per-
has a reasonable tradeoff between TF coverage and performance than PROGENy. The decrease in performance
formance [13]. In general, the performance of DoRo- was more prominent as gene coverage decreased (from
thEA dropped as gene coverage decreased. While it AUROC of 0.662 to 0.525, Fig. 1c, and similar trend with
showed reasonable prediction power with all available AUPRC, Additional file 1: Figure S1e). With a gene
genes (AUROC of 0.690), it approached almost the per- coverage of less than 2000 genes, GO-GSEA perform-
formance of a random model (AUROC of 0.5) when ance was no better than random.
only 500 genes were covered (mean AUROC of 0.547, As our benchmark data set comprises multiple per-
Fig. 1a, and similar trend with AUPRC, Additional file 1: turbation experiments per pathway, we also evaluated
Figure S1b). the performance of PROGENy and GO-GSEA at the
We next benchmarked pathway activities estimated by pathway level (Additional file 1: Figure S2a and b). The
PROGENy and GO-GSEA. In the original PROGENy pathway-wise evaluation supported our finding that
framework, 100 footprint genes are used per pathway to PROGENy outperforms GO-GSEA across all gene

Fig. 1 Testing the robustness of DoRothEA (AB), PROGENy, and GO-GSEA against low gene coverage. a DoRothEA (AB) performance (area under
ROC curve, AUROC) versus gene coverage. b PROGENy performance (AUROC) for different number of footprint genes per pathway versus gene
coverage. c Performance (AUROC) of GO-GSEA versus gene coverage. The dashed line indicates the performance of a random model. The colors
in a and c are meant only as a visual support to distinguish between the individual violin plots and jittered points
Holland et al. Genome Biology (2020) 21:36 Page 4 of 19

coverages, but the performance between pathways is probabilities are proportional to the likelihood for a
variable. given gene not to “drop-out” when simulating a single
In summary, this first benchmark provided insight into cell from the bulk sample. We determined the total
the general robustness of the bulk-based tools DoRo- number of gene counts for a simulated single cell by
thEA, PROGENy, and GO-GSEA with respect to low sampling from a normal distribution with a mean equal
gene coverage. DoRothEA performed reasonably well to the desired library size which is specified as the first
down to a gene coverage of 2000 genes. The perform- parameter of the simulation. We refer hereafter to this
ance of all different PROGENy versions was robust number as the library size. For every single cell, we then
across the entire gene coverage range tested. GO-GSEA sampled with replacement genes from the gene probabil-
showed a worse performance than PROGENy, especially ity vector up to the determined library size. The fre-
in the low gene coverage range. Since DoRothEA and quency of occurrence of individual genes becomes the
PROGENy showed promising performance in low gene new gene count in the single cell. The number of simu-
coverage ranges, we decided to explore them on scRNA- lated single cells from a single bulk sample can be speci-
seq data. Due to its poor performance, we did not in- fied as the second parameter of the simulation. Of note,
clude GO-GSEA in the subsequent analyses. this parameter is not meant to reflect a realistic number
of cells, but it is rather used to investigate the loss of in-
Benchmark on simulated single-cell RNA-seq data formation: the lower the number of simulated cells, the
For the following analyses, we expanded the set of tools more information is lost from the original bulk sample
with the statistical methods AUCell that we decoupled (Fig. 2a; see the “Methods” section). This simple work-
from the SCENIC workflow [18] and metaVIPER [19]. flow guaranteed that the information of the original bulk
Both methods were developed specifically for scRNA- perturbation is preserved and scRNA-seq characteristics,
seq analysis and thus allow the comparison of bulk vs such as drop-outs, low library size, and a high number
single-cell based tools on scRNA-seq data. AUCell is a of samples/cells are introduced.
statistical method that is originally used with GRNs con- Our bulk RNA-seq samples comprised 97 single TF
structed by SCENIC and assesses whether gene sets are perturbation experiments targeting 52 distinct TFs and
enriched in the top quantile of a ranked gene signature 15 single pathway perturbation experiments targeting 7
(see the “Methods” section). In this study, we combined distinct pathways (Additional file 1: Figure S3a and b;
AUCell with DoRothEA’s and PROGENy’s gene sets (re- see the “Methods” section). We repeated the simulation
ferred to as D-AUCell and P-AUCell, respectively). of single cells from each bulk sample template to ac-
metaVIPER is an extension of VIPER and requires mul- count for the stochasticity of the simulation procedure.
tiple gene regulatory networks instead of a single net- We tested our simulation strategy by comparing the
work. In our study, we coupled 27 tissue-specific gene characteristics of the simulated cells to real single cells.
regulatory networks with metaVIPER, which provides a In this respect, we compared the count distribution
single TF consensus activity score estimated across all (Additional file 1: Figure S4a), the relationship of mean
networks (see the “Methods” section). To benchmark all and variance of gene expression (Additional file 1: Figure
these methods on single cells, ideally, we would have S4b), and the relationship of library size to the number
scRNA-seq datasets after perturbations of TFs and path- of detected genes (Additional file 1: Figure S4c). These
ways. However, these datasets, especially for pathways, comparisons suggested that our simulated single cells
are currently very rare. To perform a comprehensive closely resemble real single cells and are thus suitable
benchmark study, we developed a strategy to simulate for benchmarking.
samples of single cells using bulk RNA-seq samples from Unlike in our first benchmark, we applied the TF and
TF and pathway perturbation experiments. pathway analysis tools directly on single samples/cells
A major cause of drop-outs in single-cell experiments and built the contrasts between perturbed and control
is the abundance of transcripts in the process of reverse- samples at the level of pathway and TF activities (see the
transcription of mRNA to cDNA [23]. Thus, our simula- “Methods” section). We compared the performance of
tion strategy was based on the assumption that genes all tools to recover the perturbed TFs/pathways. We also
with low expression are more likely to result in drop-out considered the performance on the template bulk data,
events. especially for the bulk-based tools DoRothEA and PRO-
The simulation workflow started by transforming read GENy, as a baseline for comparison to their respective
counts of a single bulk RNA-seq sample to transcripts performance on the single-cell data.
per million (TPM), normalizing for gene length and li- We show, as an example, the workflow of the perform-
brary size. Subsequently, for each gene, we assigned a ance evaluation for DoRothEA (Fig. 2b, 1. Step). As a
sampling probability by dividing the individual TPM first step, we applied DoRothEA to single cells generated
values with the sum of all TPM values. These for one specific parameter combination and bulk
Holland et al. Genome Biology (2020) 21:36 Page 5 of 19

Fig. 2 Benchmark results of TF and pathway analysis tools on simulated scRNA-seq data. a Simulation strategy of single cells from an RNA-seq
bulk sample. b Example workflow of DoRothEA’s performance evaluation on simulated single cells for a specific parameter combination (number
of cells = 10, mean library size = 5000). 1. Step: ROC-curves of DoRothEA’s performance on single cells (25 replicates) and on bulk data including
only TFs with confidence level A. 2. Step: DoRothEA performance on single cells and bulk data summarized as AUROC vs TF coverage. TF
coverage denotes the number of distinct perturbed TFs in the benchmark dataset that are also covered by the gene set resource (see
Additional file 1: Figure S3a) Results are provided for different combinations of DoRothEA’s confidence levels (A, B, C, D, E). Error bars of AUROC
values depict the standard deviation and correspond to different simulation replicates. Step 3: Averaged difference across all confidence level
combinations between AUROC of single cells and bulk data for all possible parameter combinations. The letters within the tiles indicates which
confidence level combination performs the best on single cells. The tile marked in red corresponds to the parameter setting used for previous
plots (Steps 1 and 2). c D-AUCell and d metaVIPER performance on simulated single cells summarized as AUROC for a specific parameter
combination (number of cells = 10, mean library size = 5000) and corresponding bulk data vs TF coverage. e, f Performance results of e PROGENy
and f P-AUCell on simulated single cells for a specific parameter combination (number of cells = 10, mean library size = 5000) and corresponding
bulk data in ROC space vs number of footprint genes per pathway. c–f Plots revealing the change in performance for all possible parameter
combinations (Step 3) are available in Additional file 1: Figure S7. b–f The dashed line indicates the performance of a random model
Holland et al. Genome Biology (2020) 21:36 Page 6 of 19

samples, performed differential activity analysis (see the levels on data from an experiment with comparable
“Methods” section), and evaluated the performance with characteristics in terms of sequencing depths.
ROC and PR curves including only TFs with confidence Similarly to DoRothEA, we also observed for D-
level A. In this example, we set the number of cells to 10 AUCell a tradeoff between TF coverage and perform-
as this reflects an observable loss of information of the ance on both single cells and bulk samples when using
original bulk sample and the mean library size to 5000 the same parameter combination as before (Fig. 2c, simi-
as this corresponds to a very low but still realistic se- lar trend with AUPRC Additional file 1: Figure S6b).
quencing depths of scRNA-seq experiments. Each repe- The summarized performance across all confidence level
tition of the simulation is depicted by an individual ROC combinations of D-AUCell on single cells slightly out-
curve, which shows the variance in the performance of performed its performance on bulk samples (AUROC of
DoRothEA on simulated single-cell data (Fig. 2b, 1. 0.601 on single cells and 0.597 on bulk). This trend be-
Step). The variance decreases as the library size and the comes more evident with increasing library size and the
number of cells increase (which holds true for all tested number of cells (Additional file 1: Figure S7b).
tools, Additional file 1: Figure S5a–e). The shown ROC For the benchmark of metaVIPER, we assigned confi-
curves are summarized into a single AUROC value for dence levels to the tissue-specific GTEx regulons based
bulk and mean AUROC value for single cells. We per- on DoRothEA’s gene set classification. This was done for
formed this procedure also for different TF confidence consistency with DoRothEA and D-AUCell, even if there
level combinations and show the performance change in is no difference in confidence among them. Hence, for
these values in relation to the number of distinct per- metaVIPER, we do not observe a tradeoff between TF
turbed TFs in the benchmark that are also covered by coverage and performance (Fig. 2d, similar trend with
the gene set resources that we refer to as TF coverage AUPRC Additional file 1: Figure S6c). As opposed to D-
(Fig. 2b, 2. Step). For both bulk and single cells, we ob- AUCell, metaVIPER performed clearly better on single
serve a tradeoff between TF coverage and performance cells than on bulk samples across all confidence level
caused by including different TF confidence level combi- combinations (AUROC of 0.584 on single cells and
nations in the benchmark. This result is supported by 0.531 on bulk). This trend increased with increasing li-
both AUROC and AUPRC (Additional file 1: Figure S6a) brary size and number of cells (Additional file 1: Figure
and corresponds to our previous findings [13]. The per- S7c). However, the overall performance of metaVIPER is
formance of DoRothEA on single cells does not reach worse than the performance of DoRothEA and D-
the performance on bulk, though it can still recover TF AUCell. In summary, the bulk-based tool DoRothEA
perturbations on the simulated single cells reasonably performed the best on the simulated single cells followed
well. This is especially evident for the most confident by D-AUCell. metaVIPER performed slightly better than
TFs (AUROC of 0.690 for confidence level A and 0.682 a random model.
for the confidence level combination AB). Finally, we ex- For the benchmark of pathway analysis tools, we ob-
plore the effect of the simulation parameters library size served that PROGENy performed well across different
and the number of cells on the performance by perform- number of footprint genes per pathway, with a peak at
ing the previously described analysis for all combinations 500 footprint genes for both single cells and bulk
of library sizes and cell numbers. We computed the (AUROC of 0.856 for bulk and 0.831 for single cells,
mean difference between AUROC scores of single-cell Fig. 2e, similar trend with AUPRC Additional file 1: Fig-
and bulk data across all confidence level combinations. ure S6d). A better performance for single-cell analysis
A negative difference indicates that the tool of inter- with more than 100 footprint genes per pathway is in
est performs overall better on bulk data than on agreement with the previous general robustness study
scRNA-seq data, and a positive difference that it per- that suggested that a higher number of footprint genes
forms better on scRNA-seq. We observed a gradually can counteract low gene coverage. Similarly to the
decreasing negative difference approaching 0 when benchmark of TF analysis tools, we studied the effect of
the size of the library and the number of cells in- the simulation parameters on the performance of path-
crease (Fig. 2b, 3. Step, and Additional file 1: Figure way analysis tools. We averaged for each parameter
S7a). Note, however, that the number of cells and combination the performance difference between single
thus the amount of lost information of the original cells and bulk across the different versions of PROGENy.
bulk sample has a stronger impact on the perform- For the parameter combination associated with Fig. 2e
ance than the mean library size. In addition, we iden- (number of cells = 10, mean library size = 5000), the
tified the best performing combination of DoRothEA’s average distance is negative showing that the perform-
TF confidence levels for different library sizes and the ance of PROGENy on bulk was, in general, better than
number of single cells. Thus, the results can be used on single-cell data. Increasing the library size and the
as recommendations for choosing the confidence number of cells improved the performance of PROGENy
Holland et al. Genome Biology (2020) 21:36 Page 7 of 19

on single cells reaching almost the same performance as (CRISPR knock-out) acts on the genomic level, so we
on bulk samples (Additional file 1: Figure S7d). For most cannot expect a clear relationship between KO efficacy
parameter combinations, PROGENy with 500 or 1000 and transcript level of the target. Note that the logFCs of
footprint genes per pathway yields the best performance. both Perturb-seq sub-datasets are in a narrower range in
For P-AUCell, we observed a different pattern than for comparison to the logFCs of the CRISPRi dataset (Add-
PROGENy as it worked best with 100 footprint genes itional file 1: Figure S8d). The perturbation experiments
per pathway for both single cells and bulk (AUROC of that passed this quality check were used in the following
0.788 for bulk and 0.712 for single cells, Fig. 2f, similar analyses.
trends with AUPRC Additional file 1: Figure S6e). Simi- We also considered the SCENIC framework for TF
lar to PROGENy, increasing the library size and the analysis [18]. We inferred GRNs for each sub-dataset
number of cells improved the performance, but not to using this framework (see the “Methods” section). We
the extent of its performance on bulk (Additional file 1: set out to evaluate the performance of DoRothEA, D-
Figure S7e). For most parameter combinations, P- AUCell, metaVIPER, and SCENIC on each benchmark
AUCell with 100 or 200 footprint genes per pathway dataset individually.
yielded the best performance. To perform a fair comparison among the tools, we
In summary, both PROGENy and P-AUCell performed pruned their gene set resources to the same set of TFs.
well on the simulated single cells, and PROGENy per- However, the number of TFs in the dataset-specific
formed slightly better. For pathway analysis, P-AUCell SCENIC networks was very low (109 for Perturb-Seq
did not perform better on scRNA-seq than on bulk data. (7d), 126 for Perturb-Seq (13d), and 182 TFs for CRIS-
We then went on to perform a benchmark analysis on PRi), yielding a low overlap with the other gene set re-
real scRNA-seq datasets. sources. Therefore, only a small fraction of the
benchmark dataset was usable yielding low TF coverage.
Benchmark on real single-cell RNA-seq data Nevertheless, we found that DoRothEA performed the
After showing that the footprint-based gene sets from best on the Perturb-seq (7d) dataset (AUROC of 0.752,
DoRothEA and PROGENy can handle low gene coverage Fig. 3a) followed by D-AUCell and SCENIC with almost
and work reasonably well on simulated scRNA-seq data identical performance (AUROC of 0.629 and 0.631, re-
with different statistical methods, we performed a bench- spectively). metaVIPER performed just slightly better
mark on real scRNA-seq data. However, single-cell tran- than a random model (AUROC of 0.533). Interestingly,
scriptome profiles of TF and pathway perturbations are all tools performed poorly on the Perturb-seq (13d)
very rare. To our knowledge, there are no datasets of path- dataset. In the CRISPRi dataset, DoRothEA and D-
way perturbations on single-cell level comprehensive AUCell performed the best with D-AUCell showing
enough for a robust benchmark of pathway analysis tools. slightly better performance than DoRothEA (AUROC of
For tools inferring TF activities, the situation is better: re- 0.626 for D-AUCell and 0.608 for DoRothEA). SCENIC
cent studies combined CRISPR knock-outs/knock-down and metaVIPER performed slightly better than a random
of TFs with scRNA-seq technologies [20, 21] that can model. Given that we included in this analysis only
serve as potential benchmark data. shared TFs across all gene set resources, we covered
The first dataset is based on the Perturb-seq technol- only 5 and 17 distinct TFs of the Perturb-seq and CRIS-
ogy, which contains 26 knock-out perturbations target- PRi benchmark dataset.
ing 10 distinct TFs after 7 and 13 days of perturbations To make better use of the benchmark dataset, we re-
(Additional file 1: Figure S8a) [20]. To explore the effect peated the analysis without SCENIC, which resulted in a
of perturbation time, we divided the dataset into two higher number of shared TFs among the gene set re-
sub-datasets based on perturbation duration (Perturb- sources and a higher TF coverage. The higher TF cover-
seq (7d) and Perturb-seq (13d)). The second dataset is age allowed us to investigate the performance of the
based on CRISPRi protocol and contains 141 perturb- tools in terms of DoRothEA’s confidence level. For both
ation experiments targeting 50 distinct TFs [21] (Add- Perturb-seq datasets, we found consistent results with
itional file 1: Figure S8a). The datasets showed a the previous study when the TF coverage increased from
variation in terms of drop-out rate, the number of cells, 5 to 10 (Fig. 3b). However, for the CRISPRi dataset, the
and sequencing depths (Additional file 1: Figure S8b). performance of DoRothEA and metaVIPER remained
To exclude bad or unsuccessful perturbations in the comparable to the previous study while the performance
case of CRISPRi experiments, we discarded experiments of D-AUCell dropped remarkably. These trends can also
when the logFC of the targeted gene/TF was greater be observed in PR-space (Additional file 1: Figure S8e).
than 0 (12 out of 141, Additional file 1: Figure S8c). This In summary, these analyses suggested that the tools
quality control is important only in the case of CRISPRi, DoRothEA and D-AUCell, both interrogating the manu-
as it works on the transcriptional level. Perturb-seq ally curated, high-quality regulons from DoRothEA, are
Holland et al. Genome Biology (2020) 21:36 Page 8 of 19

Fig. 3 Benchmark results of TF analysis tools on real scRNA-seq data. a Performance of DoRothEA, D-AUCell, metaVIPER, and SCENIC on all sub
benchmark datasets in ROC space vs TF coverage. b Performance of DoRothEA, D-AUCell, and metaVIPER on all sub benchmark datasets in ROC
vs TF coverage split up by combinations of DoRothEA’s confidence levels (A-E). a, b In both panels, the results for each tool are based on the
same but for the respective panel different set of (shared) TFs. TF coverage reflects the number of distinct perturbed TFs in the benchmark data
set that are also covered by the gene sets

the best-performing tools to recover TF perturbation at DoRothEA and GTEx shrank the total overlap down to
the single-cell level of real data. 20 (Additional file 1: Figure S9b). In contrast, high-
quality regulons (confidence levels A and B) from DoRo-
Application of TF and pathway analysis tools on samples thEA and GTEx alone overlapped in 113 TFs. Given the
of heterogeneous cell type populations (PBMC+HEK293T) very low regulon overlap between DoRothEA, GTEx,
In our last analysis, we wanted to test the performance and all protocol-specific SCENIC regulons, we decided
of all tested tools in a more heterogeneous system that to subset DoRothEA and GTEx to their shared TFs
would illustrate a typical scRNA-seq data analysis sce- while using all available TFs of the protocol-specific
nario where multiple cell types are present. We used a SCENIC regulons.
dataset from the Human Cell Atlas project [24] that The low overlap of the SCENIC regulons motivated us
contains scRNA-seq profiles of human peripheral blood to investigate the direct functional consequences of their
mononuclear cells (PBMCs) and HEK 293 T cell line usage. Theoretically, one would expect to retrieve highly
with annotated cell types [22]. This dataset was analyzed similar regulons as they were constructed from the same
with 13 different scRNA-seq protocols (see the biological context. We calculated the pairwise (Pearson)
“Methods” section). In this study, no ground truth (in correlations of TF activities between the scRNA-
contrast to the previous perturbation experiments) for seq technologies for each tool. The distribution of cor-
TF and pathway activities was available. To evaluate the relation coefficients for each tool denotes the
performance of all tools, we assessed the potential of TF consistency of predicted TF activity across the proto-
and pathway activities to cluster cells from the same cell cols (Additional file 1: Figure S10). The tools DoRo-
type together based on a priori annotated cell types. All thEA, D-AUCell, and metaVIPER had all a similar
pathway analysis tools and the TF analysis tools DoRo- median Pearson correlation coefficient of ~ 0.63 and
thEA, D-AUCell, and metaVIPER were readily applicable SCENIC of 0.34. This suggests that the predicted TF
to the dataset, except for SCENIC, where we first had to activities via SCENIC networks are less consistent
infer GRNs specific for each dataset (and thus experi- across the protocols than the TF activities predicted
mental protocol) from the respective data (e.g., Drop-seq via DoRothEA, D-AUCell, and metaVIPER.
regulons inferred from the Drop-seq dataset; see the To assess the clustering capacity of TF and pathway
“Methods” section). The overlap of all protocol-specific activities, we performed our analysis for each scRNA-seq
SCENIC regulons comprised only 24 TFs (Add- technology separately to identify protocol-specific and
itional file 1: Figure S9a). Including regulons from protocol-independent trends. We assumed that the cell-
Holland et al. Genome Biology (2020) 21:36 Page 9 of 19

type-specific information should be preserved also on dimensionality reduction affected cluster purity signifi-
the reduced dimension space of TF and pathway activ- cantly (two-way ANOVA p values < 2.2e−16 and 4.32e
ities if these meaningfully capture the corresponding −12, respectively, p values and estimations for corre-
functional processes. Hence, we assessed how well the sponding linear model coefficients in Additional file 1:
individual clusters correspond to the annotated cell Figure S12a; see the “Methods” section). The cluster
types by a two-step approach. First, we applied UMAP purity based on TF activities inferred using DoRothEA
on different input matrices, e.g., TF/pathway activities or and D-AUCell did not differ significantly (Fig. 4b, corre-
gene expression, and then we evaluated how well cells sponding plots for all hierarchy levels in Additional file 1:
from the same cell type cluster together. We considered Figure S12b). In addition, the cluster purity of both tools
silhouette widths as a metric of cluster purity (see the was not significantly worse than the purity based on all
“Methods” section). Intuitively, each cell type should 2000 HVGs, though we observed a slight trend indicat-
form a distinct cluster. However, some cell types are ing a better cluster purity based on HVGs. This trend is
closely related, such as different T cells (CD4 and CD8) expected due to the large difference in available features
or monocytes (CD14+ and FCGR3A+). Thus, we de- for dimensionality reduction. Instead, a comparison to
cided to evaluate the cluster purity at different levels of the positive and negative controls is more appropriate.
the cell-type hierarchy from fine-grained to coarse- Both DoRothEA and D-AUCell performed comparably
grained. We started with the hierarchy level 0 where to the positive control but significantly better than the
every cell type forms a distinct cluster and ended with negative control across all scRNA-seq protocols
the hierarchy level 4 where all PBMC cell types and the (TukeyHSD post-hoc-test, adj. p value of 1.26e−4 for
HEK cell line form a distinct cluster (Fig. 4a). Our main DoRothEA and 7.09e−4 for D-AUCell). The cluster pur-
findings rely on hierarchy level 2. ity derived from metaVIPER was significantly worse than
Silhouette widths derived from a set of highly variable for DoRothEA (TukeyHSD post-hoc-test, adj. p value of
genes (HVGs) set the baseline for the silhouette widths 0.054) and tend to be worse than D-AUCell (TukeyHSD
derived from pathway/TF activities. We identified the post-hoc-test, adj. p value of 0.163) as well. metaVIPER
top 2000 HVGs with Seurat [25] using the selection was not significantly better than the negative control.
method “vst” as it worked the best in our hands at four The cluster purity from SCENIC was significantly better
out of five hierarchy levels (Additional file 1: Figure than the negative control (TukeyHSD post-hoc-test, adj.
S11). For both TF and pathway activity matrices, the p value of 1.11e−6) and comparable to the positive con-
number of features available for dimensionality reduc- trol and thus to DoRothEA and D-AUCell. However, as
tion using UMAP was substantially less (113 TFs for mentioned above, SCENIC is only partially comparable
DoRothEA/metaVIPER, up to 400 TFs for SCENIC to the controls and other tools due to the different num-
GRNs and 14 pathways, respectively) than for a gene ex- ber of TFs.
pression matrix containing the top 2000 HVGs. As the Regardless of the underlying TF activity tool, except
number of available features for dimensionality reduc- for metaVIPER, the cluster purity derived from TF activ-
tion is different between HVGs, TFs, and pathways, we ities outperformed significantly the purity derived from
compare the cluster purity among these input features, TF expression (TukeyHSD post-hoc-test, adj. p value of
to a positive and negative control. The positive control 5.89e−6 for DoRothEA, 3.85−e5 for D-AUCell, and 4.0e
is a gene expression matrix with the top n HVGs and −8 for SCENIC). This underlines the advantage and rele-
the negative control is a gene expression matrix with vance of using TF activities over the expression of the
randomly chosen n HVGs out of the 2000 HVGs (n TF itself (Fig. 4c). With a comparable performance to a
equals 14 for pathway analysis and 113 for TF analysis). similar number of HVG and also to 2000 HVGs, we
It should be noted that in terms of TF analysis, the posi- concluded that TF activities serve—independently of the
tive and negative control is only applicable to DoRo- underlying scRNA-seq protocol—as a complementary
thEA, D-AUCell, and metaVIPER as they share the same approach for cluster analysis that is based on generally
number of features. As the protocol-specific SCENIC more interpretable cell type marker.
GRNs differ in size (Additional file 1: Figure S9a), each To evaluate the performance of pathway inference
network would require its own positive and negative methods and the utility of pathway activity scores, we
control. determined cluster purity with pathway matrices gener-
To evaluate the performance of the TF activity infer- ated by different PROGENy versions and P-AUCell. We
ence methods and the utility of TF activity scores, we used 200 and 500 footprint genes per pathway for PRO-
determined the cluster purity derived from TF activities GENy and P-AUCell, respectively, since they provided
predicted by DoRothEA, D-AUCell, metaVIPER, and the best performance in the previous analyses. As ob-
SCENIC, TF expression, and positive and negative con- served already for the TF analysis tools, scRNA-seq pro-
trols. scRNA-seq protocols and input matrices used for tocols and matrices used for dimensionality reduction
Holland et al. Genome Biology (2020) 21:36 Page 10 of 19

Fig. 4 Application of TF and pathway analysis tools on a representative scRNA-seq dataset of PBMCs and HEK cells. a Dendrogram showing how
cell lines/cell types are clustered together based on different hierarchy levels. The dashed line marks the hierarchy level 2, where CD4 T cells, CD8
T cells, and NK cells are aggregated into a single cluster. Similarly, CD14+ monocytes, FCGR3A+ monocytes, and dendritic cells are also
aggregated to a single cluster. The B cells and HEK cells are represented by separate, pure clusters. b, d Comparison of cluster purity (clusters are
defined by hierarchy level 2) between the top 2000 highly variable genes and b TF activity and TF expression and d pathway activities. The
dashed line in b separates SCENIC as it is not directly comparable to the other TF analysis tools and controls due to a different number of
considered TFs. c UMAP plots of TF activities calculated with DoRothEA and corresponding TF expression measured by SMART-Seq2 protocol. e
Heatmap of selected TF activities inferred with DoRothEA from gene expression data generated via Quartz-Seq2

affected cluster purity significantly (two-way ANOVA p between PROGENy and P-AUCell, while worse than all
values of 2.84e−7 and 1.13e−13, respectively, p values HVGs (TukeyHSD post-hoc-test, adj. p value of 4.07e
and estimations for corresponding linear model coeffi- −10 for PROGENy and 4.59e−9 for P-AUCell, Fig. 4d,
cients in Additional file 1: Figure S13a; see the corresponding plots for all hierarchy levels in Add-
“Methods” section). The cluster purity derived from itional file 1: Figure S13b). This is expected due to the
pathway activity matrices is not significantly different large difference in the number of available features for
Holland et al. Genome Biology (2020) 21:36 Page 11 of 19

dimensionality reduction (2000 HVGs vs 14 pathways). Besides these individual examples, we analyzed the
The cluster purity of both approaches was comparable biological relevance of the identified TF activities in
to the positive control but significantly better than the more detail. We assumed that the highly active TFs are
negative control (TukeyHSD post-hoc-test, adj. p value regulating important cellular functions, resulting in a
of 0.077 for PROGENy and 0.013 for P-AUCell vs nega- correlation between TF activity and essentiality. As (to
tive control). In summary, this study indicated that the our knowledge) no gene essentiality data is available for
pathway activities contain relevant and cell-type-specific PBMCs, we used hematologic cancer (lymphoma and
information, even though they do not capture enough leukemia) gene essentiality data from the DepMap pro-
functional differences to be used for effective clustering ject [31]. We compared the difference between the TF
analysis. Overall, the cluster purity of cells represented activities in lymphoid (B, T, and NK cells) and myeloid
by the estimated pathway activities is worse than the (monocytes and dendritic cells) PBMCs with the TF
cluster purity of cells represented by the estimated TF gene essentiality differences between myeloid and
activities. lymphoid hematologic cancers. SPI1, according to its
In addition, we observed that TF and pathway matrices higher activity in myeloid PBMCs, was more essential in
derived from Quartz-Seq2 protocol yielded for hierarchy myeloid leukemias (Additional file 1: Figure S15a and b,
level 2 in significantly better cluster purity than all other Wilcoxon-test p value = 0.038). For a more comprehen-
protocols, which is in agreement with the original study sive analysis, we compared the differences in TF activity
of the PBMC + HEK293T data (Additional file 1: Figure (PBMCs, lymphoid - myeloid) and the differences in TF
S12a and S13a) [22]. gene essentiality (hematologic cancers, lymphoid - mye-
TF and pathway activity scores are more interpretable loid) by calculating their Pearson correlation for all TFs.
than the expression of single genes. Hence, we were The TF activities predicted by DoRothEA correlated best
interested to explore whether we could recover known with respective essentiality scores across all scRNA-seq
cell-type-specific TF and pathway activities from the protocols (median Pearson correlation coefficient of
PBMC data. We decided to focus on the dataset mea- 0.107; 0.08 for D-AUCell; 0.04 for metaVIPER; and −
sured with Quartz-Seq2 as this protocol showed in our 0.002 for SCENIC, Additional file 1: Figure S15c). The
and in the original study superior performance over all difference in TF activities predicted with DoRothEA
other protocols [22]. We calculated mean TF and path- from the dataset generated by Smart-Seq2 and Quartz-
way activity scores for each cell type using DoRothEA, Seq2 correlated significantly with the difference in essen-
D-AUCell, metaVIPER, and SCENIC (using only TFs tiality (Pearson correlation, p value of 0.049 and 0.032,
with confidence levels A and B, Fig. 4e and Add- respectively). Thus, TF activities predicted with DoRo-
itional file 1: Figure S14a–c, respectively), PROGENy thEA regulons correlate, albeit, weakly with gene/TF
with 500 and P-AUCell with 200 footprint genes per essentiality.
pathway (Additional file 1: Figure S14d and e). In terms In summary, the analysis of this mixture sample demon-
of TF activities, we observed high RFXAP, RFXANK, strated that summarizing gene expression into TF activities
and RFX5 activity (TFs responsible for MHCII expres- can preserve cell type-specific information while drastically
sion) in monocytes, dendritic cells, and B cells (the main reducing the number of features. Hence, TF activities could
antigen-presenting cells of the investigated population be considered as an alternative to gene expression for clus-
[26]) (Additional file 1: Figure S14a and b). Myeloid tering analysis. Furthermore, they correlate, albeit weakly,
lineage-specific SPI1 activity [27] was observed in mono- with gene/TF essentiality, suggesting the biological rele-
cytes and dendritic cells. The high activity of repressor vance of the identified cell-type-specific TF activities.
TF (where regulation directionality is important) FOXP1 We also showed that pathway activity matrices contain
in T lymphocytes [28] was only revealed by DoRothEA. cell-type-specific information, too, although we do not
Proliferative TFs like Myc and E2F4 had also high activ- recommend using them for clustering analysis as the
ity in HEK cells. number of features is too low. In addition, we recovered
Regarding pathway activities, we observed across both known pathway/TF cell-type associations showing the
methods, in agreement with the literature, high activity importance of directionality and supporting the utility
of NFkB and TNFa in monocytes [29] and elevated Trail and power of the functional analysis tools DoRothEA
pathway activity in B cells (Additional file 1: Figure S14d and PROGENy.
and e) [30]. HEK cells, as expected from dividing cell
lines, had higher activity of proliferative pathways Discussion
(MAPK, EGFR, and PI3K, Additional file 1: Figure In this paper, we tested the robustness and applicability
S14d). These later pathway activity changes were only of functional analysis tools on scRNA-seq data. We in-
detected by PROGENy but not with AUCell, highlight- cluded both bulk- and single-cell-based tools that esti-
ing the importance of directionality information. mate either TF or pathway activities from gene
Holland et al. Genome Biology (2020) 21:36 Page 12 of 19

expression data and for which well-defined benchmark In terms of TF activity inference, DoRothEA per-
data exist. The bulk-based tools were DoRothEA, PRO- formed best on the simulated single cells followed by
GENy, and GO gene sets analyzed with GSEA (GO- D-AUCell and then metaVIPER. Both DoRothEA and
GSEA). The functional analysis tools specifically D-AUCell shared DoRothEA’s gene set collection but
designed for the application in single cells were SCENIC, applied different statistics. Thus, we concluded that,
AUCell combined with DoRothEA (D-AUCell) and in our data, VIPER is more suitable to analyze
PROGENy (P-AUCell) gene sets, and metaVIPER. scRNA-seq data than AUCell. The tool metaVIPER
We first explored the effect of low gene coverage in performed only slightly better than a random model,
bulk data on the performance of the bulk-based tools and since it uses VIPER like DoRothEA, the weak
DoRothEA, PROGENy, and GO-GSEA. We found that performance must be caused by the selection of the
for all tools the performance dropped with decreasing gene set resource. DoRothEA’s gene sets/TF regulons
gene coverage but at a different rate. While PROGENy were constructed by integrating different types of evi-
was robust down to 500 covered genes, DoRothEA’s per- dence spanning from literature curated to predicted
formance dropped markedly at 2000 covered genes. In TF-target interactions. For metaVIPER, we used 27
addition, the results related to PROGENy suggested that tissue-specific GRNs constructed in a data-driven
increasing the number of footprint genes per pathway manner with ARACNe [36] thus containing only pre-
counteracted low gene coverage. GO-GSEA showed the dicted TF-target interactions. The finding that espe-
strongest drop and did not perform better than a ran- cially the high-confidence TF regulons from
dom guess below 2000 covered genes. Comparing the DoRothEA outperform pure ARACNe regulons is in
global performance across all pathways of both pathway agreement with previous observations [13, 37] and
analysis tools suggests that footprint-based gene sets are emphasizes the importance of combining literature
superior over gene sets containing pathway members curated resources with in silico predicted resources.
(e.g., GO gene sets) in recovering perturbed pathways. Moreover, we hypothesize based on the pairwise com-
This observation is in agreement with previous studies parison that for functional analysis, the choice of gene
conducted by us and others [12, 32]. However, both sets is of higher relevance than the choice of the
PROGENy and GO-GSEA performed poorly for some underlying statistical method.
pathways, e.g., WNT pathway. We reason that this As one could expect, the single-cell tools D-AUCell
observation might be due to the quality of the corre- metaVIPER performed better on single cells than on the
sponding benchmark data [33]. Given this fact and that original bulk samples. This trend becomes more pro-
GO-GSEA cannot handle low gene coverage (in our nounced with increasing library size and number of cells.
hands), we concluded that this approach is not suitable However, the bulk-based tools performed even better on
for scRNA-seq analysis. Hence, we decided to focus only the simulated single cells than the scRNA specific tools.
on PROGENy as bulk-based pathway analysis tool for Related to pathway analysis, both PROGENy and P-
the following analyses. AUCell performed well on the simulated single cells.
Afterward, we benchmarked DoRothEA, PROGENy, The original framework of PROGENy uses a linear
D-AUCell, P-AUCell, and metaVIPER on simulated model that incorporates individual weights of the foot-
single cells that we sampled from bulk pathway/TF print genes, denoting the importance and also the sign
perturbation samples. We showed that our simulated of the contribution (positive/negative) to the pathway ac-
single cells possess characteristics comparable to real tivity score. Those weights cannot be considered when
single-cell data, supporting the relevance of this strat- applying AUCell with PROGENy gene sets. The slightly
egy. Different combinations of simulation parameters higher performance of PROGENy suggests that individ-
can be related to different scRNA-seq technologies. ual weights assigned to gene set members can improve
For each combination, we provide a recommendation the activity estimation of biological processes.
of how to use DoRothEA’s and PROGENy’s gene sets Subsequently, we aimed to validate the functional
(in terms of confidence level combination or number analysis tools on real single-cell data. While we could
of footprint genes per pathway) to yield the best per- not find suitable benchmark data of pathway pertur-
formance. It should be noted that our simulation ap- bations, we exploited two independent datasets of TF
proach, as it is now, allows only the simulation of a perturbations to benchmark the TF analysis tools
homogenous cell population. This would correspond which we extended with SCENIC. These datasets
to a single cell experiment where the transcriptome combined CRISPR-mediated TF knock-out/knock-
of a cell line is profiled. In future work, this simula- down (Perturb-Seq and CRISPRi) with scRNA-seq. It
tion strategy could be adapted to account for a het- should be noted that pooled screenings of gene
erogeneous dataset that would resemble more realistic knock-outs with Perturb-seq suffer from an often
single-cell datasets [34, 35]. faulty assignment of guide-RNA and single-cell [38].
Holland et al. Genome Biology (2020) 21:36 Page 13 of 19

Those mislabeled data confound the benchmark as of features available for dimensionality reduction using
the ground-truth is not reliable. In addition, our def- TF and pathway activities, cell types could be recovered
inition of true-positives and true-negatives is com- equally well as when using the same number of the top
monly used for such analyses [4, 13, 37], but it might highly variable genes. In addition, we showed that cell
be incorrect due to indirect and compensatory mech- types could be recovered more precisely using TF activ-
anisms [39]. These phenomena can confound the re- ities than TF expression, which is in agreement with pre-
sults of this type of benchmarks. vious studies [19]. This suggests that summarizing gene
Nevertheless, we showed that DoRothEA’s gene sets expression as TF and pathway activities can lead to noise
were globally effective in inferring TF activity from filtering, particularly relevant for scRNA-seq data,
single-cell data with varying performance dependent on though TF activities performed better than pathway ac-
the used statistical method. As already shown in the in tivities which is again attributed to the even lower num-
silico benchmark, D-AUCell showed a weaker perform- ber of pathways. Specifically, TF activities computed
ance than DoRothEA, supporting that VIPER performs with DoRothEA, D-AUCell, and SCENIC yielded a rea-
better than AUCell. Interestingly, metaVIPER’s perform- sonable cluster purity. It should be noted that, while
ance was no better than random across all datasets. DoRothEA and D-AUCell rely on independent regulons,
metaVIPER used the same statistical method as DoRo- the SCENIC networks are constructed from the same
thEA but different gene set resources. This further sup- dataset they are applied to. This poses the risk of overfit-
ports our hypothesis that the selection of gene sets is ting. Across technologies, the TF activities from SCENIC
more important than the statistical method for func- correlated less well than those calculated with the other
tional analysis. This trend is also apparent when com- tools, which is consistent with overfitting by SCENIC,
paring the performance of SCENIC and D-AUCell as but further analysis is required.
both rely on the statistical method AUCell but differ in Our analysis suggested at different points that the per-
their gene set resource. SCENICs’ performance was con- formance of TF and pathway analysis tools is more sen-
sistently weaker than D-AUCell. In addition, we found sitive to the selection of gene sets than the statistical
that the gene regulatory networks inferred with the methods. In particular, manually curated footprint gene
SCENIC workflow covered only a limited number of TFs sets seem to perform generally better. This hypothesis
in comparison to the relatively comprehensive regulons could be tested in the future by decoupling functional
from DoRothEA or GTEx. analysis tools into gene sets and statistics. Benchmarking
Furthermore, the perturbation time had a profound ef- all possible combinations of gene sets and statistics (i.e.,
fect on the performance of the tools: while DoRothEA DoRothEA gene sets with a linear model or PROGENy
and D-AUCell worked well for a perturbation duration gene sets with VIPER) would shed light on this question
of 6 (CRISPRi) and 7 days (Perturb-Seq (7d)), the per- which we believe is of high relevance for the community.
formance dropped markedly for 13 days. We reasoned
that, within 13 days of perturbation, compensation ef- Conclusions
fects are taking place at the molecular level that con- Our systematic and comprehensive benchmark study
found the prediction of TF activities. In addition, it is suggests that functional analysis tools that rely on
possible that cells without a gene edit outgrow cells with manually curated footprint gene sets are effective in
a successful knock-out after 13 days as the knock-out inferring TF and pathway activity from scRNA-seq
typically yield in a lower fitness and thus proliferation data, partially outperforming tools specifically de-
rate. signed for scRNA-seq analysis. In particular, the per-
In summary, DoRothEA subsetted to confidence levels formance of DoRothEA and PROGENy was
A and B performed the best on real scRNA-seq data but consistently better than all other tools. We showed
at the cost of the TF coverage. The results of the in the limits of both tools with respect to low gene
silico and in vitro benchmark are in agreement. Accord- coverage. We also provided recommendations on how
ingly, we believe that it is reasonable to assume that also to use DoRothEA’s and PROGENy’s gene sets in the
PROGENy works on real data given the positive bench- best way dependent on the number of cells, reflecting
mark results on simulated data. the amount of available information, and sequencing
Finally, we applied our tools of interest to a mixture depths. Furthermore, we showed that TF and pathway
sample of PBMCs and HEK cells profiled with 13 differ- activities are rich in cell-type-specific information
ent scRNA-seq protocols. We investigated to which ex- with a reduced amount of noise and provide an intui-
tent pathway and TF matrices retain cell-type-specific tive way of interpretation and hypothesis generation.
information, by evaluating how well cells belonging to We provide our benchmark data and code to the
the same cell type or cell type family cluster together in community for further assessment of methods for
reduced dimensionality space. Given the lower numbers functional analysis.
Holland et al. Genome Biology (2020) 21:36 Page 14 of 19

Methods metaVIPER
Functional analysis tools, gene set resources, and metaVIPER is an extension of VIPER that uses multiple
statistical methods gene regulatory networks [19]. TF activities predicted
PROGENy with each individual gene regulatory network are finally
PROGENy is a tool that infers pathway activity for 14 sig- integrated to a consensus TF activity score.
naling pathways (Androgen, Estrogen, EGFR, Hypoxia,
JAK-STAT, MAPK, NFkB, PI3K, p53, TGFb, TNFa, Trail, SCENIC
VEGF, and WNT) from gene expression data [12, 33]. By SCENIC is a computational workflow that predicts TF ac-
default pathway activity inference is based on gene sets tivities from scRNA-seq data [18]. Instead of interrogating
comprising the top 100 most responsive genes upon cor- predefined regulons, individual regulons are constructed
responding pathway perturbation, which we refer to as from the scRNA-seq data. First TF-gene co-expression
footprint genes of a pathway. Each footprint gene is modules are defined in a data-driven manner with
assigned a weight denoting the strength and direction of GENIE3. Subsequently, those modules are refined via
regulation upon pathway perturbation. Pathway scores are RcisTarget by keeping only those genes than contain the
computed by a weighted sum of the product from expres- respective transcription factor binding motif. Once the
sion and the weight of footprint genes. regulons are constructed, the method AUCell scores indi-
vidual cells by assessing for each TF separately whether
DoRothEA target genes are enriched in the top quantile of the cell
DoRothEA is a gene set resource containing signed tran- signature.
scription factor (TF)-target interactions [13]. Those in-
teractions were curated and collected from different D-AUCell/P-AUCell
types of evidence such as literature curated resources, The statistical method AUCell is not limited to SCENIC
ChIP-seq peaks, TF binding site motifs, and interactions regulons. In principle, it can be combined with any gene
inferred directly from gene expression. Based on the set resources. Thus, we coupled AUCell with gene sets
number of supporting evidence, each interaction is ac- from DoRothEA (D-AUCell) and PROGENy (P-AUCell).
companied by an interaction confidence level ranging In comparison to other statistical methods, AUCell does
from A to E, with A being the most confidence interac- not include weights of the gene set members. Thus, the
tions and E the least. In addition, a summary TF confi- mode of regulation or the likelihood of TF-target inter-
dence level is assigned (also from A to E) which is actions or weights of the PROGENy gene sets are not
derived from the leading confidence level of its interac- considered for the computation of TF and pathway
tions (e.g., a TF is assigned confidence level A if at least activities.
ten targets have confidence level A as well). DoRothEA
contains in total 470,711 interactions covering 1396 TFs Application of PROGENy on single samples/cells and
targeting 20,238 unique genes. We use VIPER in com- contrasts
bination with DoRothEA to estimate TF activities from We applied PROGENy on matrices of single samples
gene expression data, as described in [13]. (genes in rows and either bulk samples or single cells in
columns) containing normalized gene expression scores
GO-GSEA or on contrast matrices (genes in rows and summarized
We define GO-GSEA as an analysis tool that couples perturbation experiments into contrasts in columns)
GO-terms from MsigDB with the GSEA framework [7]. containing logFCs. In the case of single sample analysis,
the contrasts were built based on pathway activity matri-
VIPER ces yielding the change in pathway activity (perturbed
VIPER is a statistical framework that was developed to samples - control sample) summarized as logFC. Inde-
estimate protein activity from gene expression data using pendent of the input matrix, we scaled each pathway to
enriched regulon analysis performed by the algorithm have a mean activity of 0 and a standard deviation of 1.
aREA [15]. It requires information about interactions (if We build different PROGENy versions by varying the
possible signed) between a protein and its transcriptional number of footprint genes per pathway (100, 200, 300,
targets and the likelihood of their interaction. If not fur- 500, 1000 or all which corresponds to ~ 29,000 genes).
ther specified, this likelihood is set to 1. In the original
workflow, this regulatory network was inferred from Application of DoRothEA on single samples/cells and
gene expression by the algorithm ARACNe providing contrasts
mode of regulation and likelihood for each interaction We applied DoRothEA in combination with the statis-
[36]. However, it can be replaced by any other data re- tical method VIPER on matrices of single samples (genes
source reporting protein target interactions. in rows and either bulk samples or single cells in
Holland et al. Genome Biology (2020) 21:36 Page 15 of 19

columns) containing normalized gene expression scores method coupled with SCENIC, PROGENy, and DoRo-
scaled gene-wise to a mean value of 0 and standard devi- thEA gene sets. Before applying this method with PRO-
ation of 1 or on contrast matrices (genes in rows and GENy gene sets, we subsetted the footprint gene sets to
summarized perturbation experiments into contrasts in contain only genes available in the provided gene signa-
columns) containing logFCs. In the case of single sample ture. This guarantees a fair comparison as for the original
analysis, the contrasts were built based on TF activity PROGENy framework with a linear model, the intersec-
matrices yielding the change in TF activity (perturbed tion of footprint (gene set) members and signature genes
samples - control sample) summarized as logFC. TFs are considered. We applied AUCell with SCENIC, PRO-
with less than four targets listed in the corresponding GENy, and DoRothEA gene sets on matrices of single
gene expression matrix were discarded from the analysis. samples (genes in rows and single cells in columns) con-
VIPER provides a normalized enrichment score (NES) taining raw gene counts. Contrasts were built based on re-
for each TF which we consider as a metric for the activ- spective TF/pathway activity matrices yielding the change
ity. We used the R package viper (version 1.17.0) [15] to in TF/pathway activity (perturbed samples - control sam-
run VIPER in combination with DoRothEA. ple) summarized as logFC. For the AUCell analysis, we
used the R package AUCell (version 1.5.5) [18].
Application of GO-GSEA sets on contrasts
We applied GSEA with GO gene sets on contrast matri- Induction of artificial low gene coverage in bulk
ces (genes in rows and summarized perturbation experi- microarray data
ments into contrasts in columns) containing logFCs that We induce the reduction of gene coverage by inserting
serve also as gene-level statistic. We selected only those zeros on the contrast level. In detail, we insert for each
GO terms which map to PROGENy pathways in order contrast separately randomly zeros until we obtained a
to guarantee a fair comparison between both tools. For predefined number of genes with a logFC unequal zero
the enrichment analysis, we used the R package fgsea which we consider as “covered”/“measured” genes. We
(version 1.10.0) [40] with 1000 permutations per gene perform this analysis for a gene coverage of 500, 1000,
signature. 2000, 3000, 5000, 7000, 8000 and as reference all avail-
able genes. To account for stochasticity effects during
Application of metaVIPER on single samples inserting randomly zero, we repeat this analysis 25 times
We ran metaVIPER with 27 tissue-specific gene regula- for each gene coverage value.
tory networks which we constructed before for one of
our previous studies [13]. Those tissue-specific gene
regulatory networks were derived using ARACNe [36] Simulation of single cells
taking the database GTEx [41] as tissue-specific gene ex- Let C be a vector representing counts per gene for a single
pression sample resource. We applied metaVIPER on bulk sample. C is normalized for gene length and library
matrices of single samples (genes in rows and single cells size resulting in vector B containing TPM values per gene.
in columns) containing normalized gene expression We assume that samples are obtained from homogenous
scores scaled gene-wise to a mean value of 0 and a cell populations and that the probability of a dropout
standard deviation of 1. If required, contrasts were built event is inversely proportional to the relative TPM of each
based on TF activity matrices yielding the change in TF measured gene in the bulk sample. Therefore, we define a
activity (perturbed samples - control sample) summa- discrete cumulative distribution function from the vector
rized as logFC. TFs with less than four targets listed in of gene frequencies P ¼ jBjB
. To simulate a single cell from
the corresponding input matrix were discarded from the this distribution, we draw and aggregate L samples by in-
analysis. metaVIPER provides a NES integrated across verse transform sampling. L corresponds to the library size
all regulatory networks for each TF which we consider for the count vector of the simulated single cell. We draw
as a metric for the activity. We used the R package viper L from a normal distribution Nðμ; μ2Þ.
(version 1.17.0) [15] to run metaVIPER. To benchmark the robustness of the methods, we vary
the number of cells sampled from a single bulk sample
Application of AUCell with either SCENIC, DoRothEA, or (1, 10, 20, 30, 50, 100) and the value of μ (1000, 2000,
PROGENy gene sets on single samples 5000, 10.000, 20.000). To account for stochasticity ef-
AUCell is a statistical method to determine specifically for fects during sampling, we repeat this analysis 25 times
single cells whether a given gene set is enriched at the top for each parameter combination.
quantile of a ranked gene signature. Therefore, AUCell de- Prior to normalization, we discarded cells with a li-
termines the area under the recovery curve to compute brary size lower than 100. We normalized the count
the enrichment score. We defined the top quantile as the matrices of the simulated cells by using the R package
top 5% of the ranked gene signature. We applied this scran (version 1.11.27) [42]. Contrast matrices were
Holland et al. Genome Biology (2020) 21:36 Page 16 of 19

constructed by comparing cells originating from one of Collecting, curating, and processing of transcriptomic
the perturbation bulk samples vs cells originating from data
one of the control bulk samples. General robustness study
We extracted single-pathway and single-TF perturbation
Gene regulatory network (GRN) reconstruction using data profiled with microarrays from a previous study
SCENIC conducted by us [33]. We followed the same procedure
We infer GRNs on individual sub-datasets using the of collection, curating, and processing the data as de-
SCENIC (v. 1.1.2-2) workflow [18]. In brief, gene scribed in the previous study.
expression was filtered using default parameters and
log2-transformed for co-expression analysis following In silico benchmark
the recommendations by the authors. We identified po- For the simulation of single cells, we collected, curated,
tential targets of transcription factors (TFs) based on and processed single TF and single pathway perturbation
their co-expression to TFs using GENIE3 (v. 1.6.0, Ran- data profiled with bulk RNA-seq. We downloaded basic
dom Forest with 1000 trees). We pruned co-expression metadata of single TF perturbation experiments from
modules to retrieve only putative direct-binding interac- the ChEA3 web-server ([Link]
tions using RcisTarget (v. 1.4.0) and the cis-regulatory chea3/) [37] and refined the experiment and sample an-
DNA-motif databases for hg38 human genome assembly notation (Additional file 2). Metadata of single pathway
(Version 9 - mc9nr, with distances TSS+/− 10kbp and perturbation experiments were manually extracted by us
500bpUp100Dw, from [Link] from Gene Expression Omnibus (GEO) [43] (Add-
target/) with default parameters. Only modules with a itional file 3). Count matrices for all those experiments
significant motif enrichment of the TF upstream were were downloaded from ARCHS4 ([Link]
kept for the final GRN. While we were running the [Link]/archs4/) [44].
workflow, 75 genes out of 27,091 from the first DNA- We normalized count matrices by first calculating
motif database (TSS+/− 10kbp) were inconsistent, i.e., normalization factors and second transforming count
were not described in the second one (500bpUp100Dw), data to log2 counts per million (CPM) using the R pack-
leading to an error of the workflow execution. Thus, ages edgeR (version 3.25.8) [45] and limma (version
these 75 genes were discarded from the database to 3.39.18) [46], respectively.
complete the workflow.
In vitro benchmark
Benchmarking process with ROC and PR metrics To benchmark VIPER on real single-cell data, we
To transform the benchmark into a binary setup, all ac- inspected related literature and identified two publica-
tivity scores of experiments with negative perturbation tions which systematically measure the effects of tran-
effect (inhibition/knockdown) are multiplied by −1. This scription factors on gene expression in single cells:
guarantees that TFs/pathways belong to a binary class Dixit et al. introduced Perturb-seq and measured the
either deregulated or not regulated and that the per- knockout-effects of ten transcription factors on K562
turbed pathway/TF has in the ideal case the highest cells 7 and 13 days after transduction [20]. We down-
activity. loaded the expression data from GEO (GSM2396858
We performed the ROC and PR analysis with the R and GSM2396859) and sgRNA-cell mappings made
package yardstick (version 0.0.3; [Link] available by the author upon request in the files pro-
tidymodels/yardstick). For the construction of ROC moters_concat_all.csv (for GSM2396858) and pt2_con-
and PR curves, we calculated for each perturbation cat_all.csv (for GSM2396859) on [Link]/asncd/
experiment pathway (or TF) activities. As each per- MIMOSCA. We did not consider the High MOI dataset
turbation experiment targets either a single pathway due to the expected high number of duplicate sgRNA as-
(or TF), only the activity score of the perturbed path- signments. Cells were quality filtered based on expres-
way (or TF) is associated with the positive class (e.g., sion, keeping the upper half of cells for each dataset.
EGFR pathway activity score in an experiment where Only sgRNAs detected in at least 30 cells were used. For
EGFR was perturbed). Accordingly, the activity scores the day 7 dataset, 16,507, and for day 13 dataset, 9634
of all non-perturbed pathways (or TFs) belong to the cells remained for benchmarking.
negative class (e.g., EGFR pathway activity score in an Ryan et al. measured knockdown effects of 50 tran-
experiment where the JAK-STAT pathway was per- scription factors implicated in human definitive endo-
turbed). Using these positive and negative classes, derm differentiation using a CRISPRi variant of
Sensitivity/(1-Specificity) or Precision/Recall values CROPseq in human embryonic stem cells 6 days after
were calculated at different thresholds of activity, pro- transduction [21]. We obtained data of both replicates
ducing the ROC/PR curves. from GEO (GSM3630200, GSM3630201), which include
Holland et al. Genome Biology (2020) 21:36 Page 17 of 19

sgRNA counts next to the rest of the transcription. We calculated. Then, the smallest average distance (b) to
refrained from using the targeted sequencing of the all cells belonging to the newest foreign cluster is
sgRNA in GSM3630202, GSM3630203 as it contained calculated. The difference, between the latter and the
less clear mappings due to amplification noise. Expres- former, indicates the width of the silhouette for that
sion data lacked information on mitochondrial genes, cell, i.e., how well the cell is embedded in the
and therefore, no further quality filtering of cells was assigned cluster. To make the silhouette widths com-
performed. From this dataset, only sgRNAs detected in parable, they are normalized by dividing the differ-
at least 100 cells were used. A combined 5282 cells ence with the larger of the two average distances
remained for benchmarking. s ¼ maxða;bÞ
b−a
. Therefore, the possible values for the sil-
Analysis was limited to the 10,000 most expressed houette widths lie in the range − 1 to 1, where
genes for all three datasets. higher values indicate good cluster assignment, while
We normalized the count matrices for each individual lower values close to 0 indicate poor cluster assign-
dataset (Perturb-Seq (7d), Perturb-Seq (13d), and CRIS- ment. Finally, the average silhouette width for every
PRi) separately by using the R package scran (version cluster is calculated, and averages are aggregated to
1.11.27) [42]. obtain a measure of the global purity of clusters. For
the silhouette analysis, we used the R package cluster
Human Cell Atlas study (version 2.0.8).
This scRNA-seq dataset originates from a benchmark For statistical analysis of cluster quality, we fitted a
study of the Human Cell Atlas project and is available linear model score = f(scRNA-seq protocol + input
on GEO (GSE133549) [22]. The dataset consists of matrix), where score corresponds to average silhouette
PBMCs and a HEK293T sample which was analyzed width for a given scRNA-seq protocol - input matrix
with 13 different scRNA-seq technologies (CEL-Seq2, pair. Protocol and input matrix are factors, with refer-
MARS-Seq, Quartz-Seq2, gmcSCRB-Seq, ddSEQ, ence level Quartz-Seq2 and positive control, respect-
ICELL8, C1HT-Small, C1HT-Medium, Chromium, ively. We fitted two separate linear models for
Chromium(sn), Drop-seq, inDrop). Most cells are an- transcription factor and pathway activity inference
notated with a specific cell type/cell line (CD4 T cells, methods. We report the estimates and p values for
CD8 T cells, NK cells, B cells, CD14+ monocytes, the different coefficients of these linear models. Based
FCGR3A+ monocytes, dendritic cells, megakaryocytes, on these linear models, we performed a two-way
HEK cells). Megakaryocytes (due to their low abun- ANOVA and pairwise comparisons using TukeyHSD
dance) and cells without annotation were discarded post hoc test.
from this analysis.
We normalized the count matrices for each tech-
nology separately by using the R package scran (ver- Comparison of PBMCs TF activity with gene essentiality
sion 1.11.27) [42]. For each scRNA-seq technology and used TF analysis
tool, we calculated mean TF expression for each PBMC
Dimensionality reduction with UMAP and assessment of type. To focus solely on PBMCs, cells classified as HEK
cluster purity cells or unknown were discarded from this analysis. In
We used the R package umap (version [Link]) calling addition, we removed megakaryocytes because their
the Python implementation of Uniform Manifold abundance was in general too low across all technolo-
Approximation and Projection (UMAP) with the ar- gies. We used the DepMap shRNA screen [31] as gene
gument “method = ‘umap-learn’” to perform dimen- essentiality data. As a given TF can either increase pro-
sionality reduction on various input matrices (gene liferation (oncogene) or decrease it (tumor suppressor),
expression matrix, pathway/TF activity matrix, etc.). we can expect either negative or positive correlation (re-
We assume that the dimensionality reduction will respectively) between gene essentiality and TF activity. To
sult in clustering of cells that corresponds well to correct for this effect, we calculated Pearson correlations
the cell type/cell type family. To assess the validity of between TF expression (from CCLE data [48]) and TF
this assumption, we assigned a cell-type/cell family- essentiality for each TF and multiplied TF essentiality
specific cluster-id to each point in the low- values by the sign of this correlation coefficients. For
dimensional space. We then defined a global cluster categorizing hematologic cancers into myeloid and
purity measure based on silhouette widths [47], lymphoid groups, we used CCLE metadata (Add-
which is a well-known clustering quality measure. itional file 4). Basically, we classified myeloid leukemias
Given the cluster assignments, in the low- as myeloid and lymphoid leukemias and lymphomas as
dimensional space, for each cell, the average distance lymphoid cancers. Ambiguous cancer types were re-
(a) to the cells that belong to the same cluster is moved from our analysis.
Holland et al. Genome Biology (2020) 21:36 Page 18 of 19

Supplementary information Author details

1
Supplementary information accompanies this paper at [Link] Institute for Computational Biomedicine, Bioquant, Heidelberg University,
1186/s13059-020-1949-z. Faculty of Medicine, and Heidelberg University Hospital, Heidelberg,
Germany. 2Joint Research Centre for Computational Biomedicine
(JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, Aachen,
Additional file 1. Supplementary figures S1-S15.
Germany. 3Department of Knowledge Technologies, Jožef Stefan Institute,
Additional file 2. Metadata of bulk RNA-seq TF perturbation data, in- Ljubljana, Slovenia. 4German Cancer Research Center (DKFZ), Heidelberg,
cluding among others perturbation target, perturbation direction, GEO Germany. 5European Molecular Biology Laboratory (EMBL), Genome Biology
accession ID and annotated GEO sample IDs (whether samples belong to Unit, Heidelberg, Germany. 6Department of Biological Engineering, MIT,
control or perturbation group). Those data were used to simulate single Cambridge, MA, USA. 7CNAG-CRG, Centre for Genomic Regulation (CRG),
cells. (CSV 13 kb) Barcelona Institute of Science and Technology (BIST), Barcelona, Spain. 8Koch
Additional file 3. Metadata of bulk RNA-seq pathway perturbation data, Institute for Integrative Cancer Biology, MIT, Cambridge, MA, USA. 9European
including among others perturbation target, perturbation direction, GEO Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome
accession ID and annotated GEO sample IDs (whether samples belong to Genome Campus, Cambridge, UK. 10Universitat Pompeu Fabra (UPF),
control or perturbation group). Those data were used to simulate single Barcelona, Spain. 11Faculty of Medicine, Department of Physiology,
cells. (CSV 1 kb) Semmelweis University, Budapest, Hungary.
Additional file 4. Manual classification of selected hematologic cancer
Received: 3 September 2019 Accepted: 29 January 2020
cell lines from the CCLE database into myeloid (M) or lymphoid (L)
cancer. (CSV 2 kb)
Additional file 5. Review history.
References
1. Essaghir A, Toffalini F, Knoops L, Kallin A, van Helden J, Demoulin J-B.
Acknowledgements Transcription factor regulation can be accurately predicted from the
We thank Aurélien Dugourd and Ricardo Ramirez-Flores for helpful discus- presence of target gene signatures in microarray gene expression data.
sions. We also thank Minoo Ashtiani for supporting the collection of single Nucleic Acids Res. 2010;38:e120 Available from: [Link]
pathway perturbation experiments on bulk level. gkq149.
2. Hung J-H, Yang T-H, Hu Z, Weng Z, DeLisi C. Gene set enrichment analysis:
Peer review information performance evaluation and usage guidelines. Brief Bioinform. 2012;13:281–
Yixin Yao was the primer editor of this article and managed its editorial 91 Available from: [Link]
process and peer review in collaboration with the rest of the editorial team. 3. Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current
approaches and outstanding challenges. PLoS Comput Biol. 2012;8:
Review history e1002375 Available from: [Link]
The review history is available as Additional file 5. 4. Nguyen T-M, Shafi A, Nguyen T, Draghici S. Identifying significantly
impacted pathways: a comprehensive review and assessment. Genome Biol.
Authors’ contributions 2019;20:203 Available from: [Link]
CHH, BS, and JSR designed the research. CHH performed the analyses and 5. Liberzon A, Birger C, Thorvaldsdottir H, Ghandi M, Mesirov JP, Tamayo P.
drafted the manuscript. BS and JT supervised by JSR and MPK and BAJ both The molecular signatures database (MSigDB) hallmark gene set collection.
supervised by DAL supported the development of the single-cell simulation Cell Syst. 2015;1(6):417–25. [Link] Epub
strategy. JT and JPP set up the cluster infrastructure for the simulation. JG su- 2016/01/16. PMID: 26771021.
pervised by OS processed the real scRNA-seq data. JPP constructed the 6. Fisher RA. Statistical methods for research workers [Internet]: Genesis
SCENIC gene regulatory networks. EM and HH provided the PBMC single-cell Publishing Pvt Ltd; 2006. Available from: [Link]
data and supported the corresponding analysis. BS performed the blood can- Fisher/Methods/[Link]
cer analysis. BS, JT, JG, JPP, and JSR contributed to the manuscript writing. 7. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA,
JSR and BS supervised the project. All authors read, commented, and ap- et al. Gene set enrichment analysis: a knowledge-based approach for
proved the final manuscript. interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A.
2005;102:15545–50 Available from: [Link]
8. Trescher S, Münchmeyer J, Leser U. Estimating genome-wide regulatory
Funding
activity from multi-omics data sets using mathematical optimization. BMC Syst
CHH is supported by the German Federal Ministry of Education and Research
Biol. 2017;11:41 Available from: [Link]
(BMBF)-funded project Systems Medicine of the Liver (LiSyM, FKZ: 031 L0049).
9. Pang H, Lin A, Holford M, Enerson BE, Lu B, Lawton MP, et al. Pathway analysis
MPK, BAJ, and DAL are supported by NIH Grant U54-CA217377. BS is sup-
using random forests classification and regression. Bioinformatics. 2006;22:
ported by the Premium Postdoctoral Fellowship Program of the Hungarian
2028–36 Available from: [Link]
Academy of Sciences. HH is a Miguel Servet (CP14/00229) researcher funded
10. Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. mRNA-Seq
by the Spanish Institute of Health Carlos III (ISCIII). This work has received
whole-transcriptome analysis of a single cell. Nat Methods. 2009;6:377–82
funding from the Ministerio de Ciencia, Innovación y Universidades
Available from: [Link]
(SAF2017-89109-P; AEI/FEDER, UE).
11. Stegle O, Teichmann SA, Marioni JC. Computational and analytical
challenges in single-cell transcriptomics. Nat Rev Genet. 2015;16:133–45
Availability of data and materials
Available from: [Link]
The code to perform all presented studies is written in R [49, 50] and is freely
12. Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, et al.
available on GitHub: [Link]
Perturbation-response genes reveal signaling footprints in cancer gene
scRNAseq [51]. The datasets supporting the conclusions of this article are
expression [Internet]. Nature Communications. 2018; Available from: https://
available at Zenodo: [Link] [52].
[Link]/10.1038/s41467-017-02391-6.
13. Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J.
Ethics approval and consent to participate Benchmark and integration of resources for the estimation of human
Not applicable transcription factor activities. Genome Res. 2019;29:1363–75 Available from:
[Link]
Consent for publication 14. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Michael Cherry J, et al.
Not applicable Gene Ontology: tool for the unification of biology. Nat Genet. 2000:25–9
Available from: [Link]
Competing interests 15. Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, et al.
The authors declare that they have no competing interests. Functional characterization of somatic mutations in cancer using network-
Holland et al. Genome Biology (2020) 21:36 Page 19 of 19

based inference of protein activity. Nat Genet. 2016;48:838–47 Available 35. Peng T, Zhu Q, Yin P, Tan K. SCRABBLE: single-cell RNA-seq imputation
from: [Link] constrained by bulk RNA-seq data. Genome Biol. 2019;20:88 Available from:
16. Dugourd A, Saez-Rodriguez J. Footprint-based functional analysis of multi- [Link]
omic data. Current Opinion in Systems Biology: Elsevier; 2019. Available 36. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R,
from: [Link] et al. ARACNE: an algorithm for the reconstruction of gene regulatory
17. Cantini L, Calzone L, Martignetti L, Rydenfelt M, Blüthgen N, Barillot E, et al. networks in a mammalian cellular context. BMC Bioinformatics. 2006;7(Suppl
Classification of gene signatures for their information value and functional 1):S7 Available from: [Link]
redundancy. NPJ Syst Biol Appl. 2018; Available from: [Link] 37. Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz ML, Utti V, et al.
1038/s41540-017-0038-8. ChEA3: transcription factor enrichment analysis by orthogonal omics
18. Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, integration. Nucleic Acids Res. 2019; Available from: [Link]
Hulselmans G, et al. SCENIC: single-cell regulatory network inference and nar/gkz446.
clustering. Nat Methods. 2017;14:1083–6 Available from: [Link] 38. Hegde M, Strand C, Hanna RE, Doench JG. Uncoupling of sgRNAs from their
1038/nmeth.4463. associated barcodes during PCR amplification of combinatorial CRISPR
19. Ding H, Douglass EF Jr, Sonabend AM, Mela A, Bose S, Gonzalez C, et al. screens. PLoS One. 2018;13:e0197547 Available from: [Link]
Quantitative assessment of protein activity in orphan tissues and single cells 1371/[Link].0197547.
using the metaVIPER algorithm. Nat Commun. 2018;9:1471 Available from: 39. Smits AH, Ziebell F, Joberty G, Zinn N, Mueller WF, Clauder-Münster S, et al.
[Link] Biological plasticity rescues target activity in CRISPR knock outs. Nat
20. Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, et al. Perturb-Seq: Methods. 2019;16:1087–93 Available from: [Link]
dissecting molecular circuits with scalable single-cell RNA profiling of 019-0614-5.
pooled genetic screens. Cell. 2016;167:1853–66.e17 Available from: https:// 40. Sergushichev A. An algorithm for fast preranked gene set enrichment
[Link]/10.1016/[Link].2016.11.038. analysis using cumulative statistic calculation. bioRxiv. 2016:060012 [cited
21. Genga RMJ, Kernfeld EM, Parsi KM, Parsons TJ, Ziller MJ, Maehr R. Single-cell 2018 Jul 17]. Available from: [Link]
RNA-sequencing-based CRISPRi screening resolves molecular drivers of early 06/20/[Link].
human endoderm development. Cell Rep. 2019;27:708–18.e10 Available 41. Carithers LJ, Ardlie K, Barcus M, Branton PA, Britton A, Buia SA, et al. A novel
from: [Link] approach to high-quality postmortem tissue procurement: the GTEx project.
22. Mereu E, Lafzi A, Moutinho C, Ziegenhain C, MacCarthy DJ, Alvarez A, et al. Biopreserv Biobank. 2015;13:311–9 Available from: [Link]
Benchmarking single-cell RNA sequencing protocols for cell atlas projects. bio.2015.0032.
BioRxiv. 2019; [Link]. Available from: [Link] 42. Lun ATL, McCarthy DJ, Marioni JC. A step-by-step workflow for low-level
0.1101/[Link]. analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 2016;5:
23. Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell 2122 Available from: [Link]
differential expression analysis. Nat Methods. 2014;11:740–2 Available from: 43. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene
[Link] expression and hybridization array data repository. Nucleic Acids Res. 2002;
24. Regev A, Teichmann SA, Lander ES, Amit I, Benoist C. Science forum: the 30:207–10 Available from: [Link]
human cell atlas. Elife. 2017; [Link]. Available from: https:// 44. Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, et al.
[Link]/articles/27041/[Link]. Massive mining of publicly available RNA-seq data from human and mouse.
25. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell Nat Commun. 2018;9:1366 Available from: [Link]
transcriptomic data across different conditions, technologies, and species. Nat 018-03751-6.
Biotechnol. 2018;36:411–20 Available from: [Link] 45. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for
26. Burd AL, Ingraham RH, Goldrick SE, Kroe RR, Crute JJ, Grygon CA. Assembly differential expression analysis of digital gene expression data.
of major histocompatibility complex (MHC) class II transcription factors: Bioinformatics. 2010;26:139–40 Available from: [Link]
association and promoter recognition of RFX proteins. Biochemistry. 2004; bioinformatics/btp616.
43:12750–60 Available from: [Link] 46. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers
27. Zakrzewska A, Cui C, Stockhammer OW, Benard EL, Spaink HP, Meijer AH. differential expression analyses for RNA-sequencing and microarray studies.
Macrophage-specific gene functions in Spi1-directed innate immunity. Blood. Nucleic Acids Res. 2015;43:e47 Available from: [Link]
2010;116:e1–11 Available from: [Link] gkv007.
47. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and
28. Feng X, Wang H, Takata H, Day TJ, Willen J, Hu H. Transcription factor Foxp1
validation of cluster analysis. J Comput Appl Math. 1987;20:53–65 Available
exerts essential cell-intrinsic regulation of the quiescence of naive T cells. Nat
from: [Link]
Immunol. 2011;12:544–50 Available from: [Link]
48. Ghandi M, Huang FW, Jané-Valbuena J, Kryukov GV, Lo CC, McDonald ER
29. Liu T, Zhang L, Joo D, Sun S-C. NF-κB signaling in inflammation. Signal
3rd, et al. Next-generation characterization of the Cancer Cell Line
Transduct Target Ther. 2017;2 Available from: [Link]
Encyclopedia. Nature. 2019;569:503–8 Available from: [Link]
sigtrans.2017.23.
1038/s41586-019-1186-3.
30. Staniek J, Lorenzetti R, Heller B, Janowska I, Schneider P, Unger S, et al.
49. Core Team R, et al. R: a language and environment for statistical computing.
TRAIL-R1 and TRAIL-R2 mediate TRAIL-dependent apoptosis in activated
Vienna: R Foundation for statistical computing; 2013.
primary human B lymphocytes. Front Immunol. 2019;10:951 Available from:
50. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al.
[Link]
Welcome to the Tidyverse. JOSS. 2019;4:1686 Available from: [Link]
31. McFarland JM, Ho ZV, Kugener G, Dempster JM, Montgomery PG, Bryan JG,
[Link]/papers/10.21105/joss.01686.
et al. Improved estimation of cancer dependencies from large-scale RNAi
51. Holland CH, Tanevski J, Perales-Patón J, Gleixner J, Kumar MP, Mereu E, et al.
screens using model-based normalization and data integration. Nat Commun.
Robustness and applicability of transcription factor and pathway analysis
2018;9:–4610 Available from: [Link]
tools on single-cell RNA-seq data. GitHub. 2020; Available from: https://
32. Parikh JR, Klinger B, Xia Y, Marto JA, Blüthgen N. Discovering causal
[Link]/saezlab/FootprintMethods_on_scRNAseq.
signaling pathways through gene-expression patterns. Nucleic Acids Res.
52. Holland CH, Saez-Rodriguez J. Robustness and applicability of transcription
2010;38:W109–17 Available from: [Link]
factor and pathway analysis tools on single-cell RNA-seq data. 2019.
33. Holland CH, Szalai B, Saez-Rodriguez J. Transfer of regulatory knowledge Available from: [Link]
from human to mouse for functional genomics analysis. Biochim Biophys
Acta Gene Regul Mech. 2019:194431 Available from: [Link]
[Link].2019.194431. Publisher’s Note
34. Zappia L, Phipson B, Oshlack A. Splatter: simulation of single-cell RNA Springer Nature remains neutral with regard to jurisdictional claims in
sequencing data. Genome Biol. 2017;18:174 Available from: [Link] published maps and institutional affiliations.
10.1186/s13059-017-1305-0.

RNA-seq Differential Expression Software Comparison
No ratings yet
RNA-seq Differential Expression Software Comparison
12 pages
Bulk Analyse R
No ratings yet
Bulk Analyse R
7 pages
Single Cell RNA-Seq Data Analysis Techniques
No ratings yet
Single Cell RNA-Seq Data Analysis Techniques
81 pages
Evaluating Pathway Activity Methods in Cancer
No ratings yet
Evaluating Pathway Activity Methods in Cancer
24 pages
Single-Cell RNA-seq Workshop 2024
No ratings yet
Single-Cell RNA-seq Workshop 2024
68 pages
Enhancing RNA-seq Analysis with saseR
No ratings yet
Enhancing RNA-seq Analysis with saseR
36 pages
Bioinformatics For Wet Lab
No ratings yet
Bioinformatics For Wet Lab
19 pages
RNA-Seq Normalization for Heterogeneous Data
No ratings yet
RNA-Seq Normalization for Heterogeneous Data
10 pages
Clonotype Graph Analysis in Immunology
No ratings yet
Clonotype Graph Analysis in Immunology
13 pages
Functional Genomics
100% (1)
Functional Genomics
210 pages
scRNA-seq: Triumphs and Limitations
No ratings yet
scRNA-seq: Triumphs and Limitations
14 pages
Evaluating Single-Cell Data Integration Methods
No ratings yet
Evaluating Single-Cell Data Integration Methods
6 pages
Guide to Single-Cell Transcriptomics
No ratings yet
Guide to Single-Cell Transcriptomics
14 pages
scGen: Predicting Single-Cell Responses
No ratings yet
scGen: Predicting Single-Cell Responses
11 pages
scRNA-seq Data Analysis Course Guide
No ratings yet
scRNA-seq Data Analysis Course Guide
294 pages
STGRNS: Transformer for Gene Networks
No ratings yet
STGRNS: Transformer for Gene Networks
8 pages
Web-Based SC RNAseq Analysis Tool
No ratings yet
Web-Based SC RNAseq Analysis Tool
1 page
Gene Expression Modeling from RNA Data
No ratings yet
Gene Expression Modeling from RNA Data
18 pages
Evaluating scRNA-seq Imputation Methods
No ratings yet
Evaluating scRNA-seq Imputation Methods
30 pages
scRNA-seq Data Analysis Guidelines
No ratings yet
scRNA-seq Data Analysis Guidelines
9 pages
Transcriptome Analysis
No ratings yet
Transcriptome Analysis
6 pages
Computational Method For Single Cell Data Analysis
No ratings yet
Computational Method For Single Cell Data Analysis
270 pages
Bioinformatics: Original Paper
No ratings yet
Bioinformatics: Original Paper
8 pages
Machine Learning and Statistical Methods For Clustering Single-Cell RNA-sequencing Data
No ratings yet
Machine Learning and Statistical Methods For Clustering Single-Cell RNA-sequencing Data
15 pages
BBZ 062
No ratings yet
BBZ 062
14 pages
Scgen Predicts Single-Cell Perturbation Responses: Articles
No ratings yet
Scgen Predicts Single-Cell Perturbation Responses: Articles
10 pages
Computational Biology Course Overview
No ratings yet
Computational Biology Course Overview
69 pages
A Review of Software For Predicting Gene Function
No ratings yet
A Review of Software For Predicting Gene Function
14 pages
SFAG: Data Smoothing for scRNA-seq
No ratings yet
SFAG: Data Smoothing for scRNA-seq
6 pages
High-Dimensional Single-Cell RNA Tools
No ratings yet
High-Dimensional Single-Cell RNA Tools
14 pages
Bioinformatics for Single-Cell RNA Sequencing
No ratings yet
Bioinformatics for Single-Cell RNA Sequencing
7 pages
RNA-seq Pipeline Gene Expression Variability
No ratings yet
RNA-seq Pipeline Gene Expression Variability
9 pages
FuncFetch: Streamlining Enzyme Mining
No ratings yet
FuncFetch: Streamlining Enzyme Mining
9 pages
NOISeq: RNA-seq Data Quality Analysis
No ratings yet
NOISeq: RNA-seq Data Quality Analysis
15 pages
Best Practices in Single-Cell RNA-Seq
No ratings yet
Best Practices in Single-Cell RNA-Seq
23 pages
Transcription Factor Binding Site Analysis
No ratings yet
Transcription Factor Binding Site Analysis
5 pages
Impact of nf-core/rnaseq Pipeline Variants
No ratings yet
Impact of nf-core/rnaseq Pipeline Variants
9 pages
Single-Cell RNA Sequencing Tools Guide
No ratings yet
Single-Cell RNA Sequencing Tools Guide
353 pages
On The Widespread and Critical Impact of Systematic Bias
No ratings yet
On The Widespread and Critical Impact of Systematic Bias
13 pages
Gene Expression Analysis for Biomarker Discovery
No ratings yet
Gene Expression Analysis for Biomarker Discovery
15 pages
High-Throughput Sequencing Overview
No ratings yet
High-Throughput Sequencing Overview
5 pages
Single-Cell Transcriptomics in Toxicology
No ratings yet
Single-Cell Transcriptomics in Toxicology
7 pages
Co-Expression in Single-Cell RNA Analysis
No ratings yet
Co-Expression in Single-Cell RNA Analysis
9 pages
Count-Based Differential Expression Analysis of RNA Sequencing Data Using R and Bioconductor
No ratings yet
Count-Based Differential Expression Analysis of RNA Sequencing Data Using R and Bioconductor
22 pages
RNA-Seq Analysis with Kallisto and Salmon
No ratings yet
RNA-Seq Analysis with Kallisto and Salmon
163 pages
RNA-seq Best Practices Guide
No ratings yet
RNA-seq Best Practices Guide
34 pages
Data 06 00075
No ratings yet
Data 06 00075
10 pages
Cancer Genomics and Functional Analysis
No ratings yet
Cancer Genomics and Functional Analysis
17 pages
Wang2019 Article MiningDataAndMetadataFromTheGe
No ratings yet
Wang2019 Article MiningDataAndMetadataFromTheGe
8 pages
RNA-seq Data Analysis Course Overview
No ratings yet
RNA-seq Data Analysis Course Overview
50 pages
Optimization of Oxford Nanopore Technology Sequencing
No ratings yet
Optimization of Oxford Nanopore Technology Sequencing
14 pages
Challenges in Regulatory Genomics Insights
No ratings yet
Challenges in Regulatory Genomics Insights
30 pages
Guidelines for Trajectory Inference Methods
No ratings yet
Guidelines for Trajectory Inference Methods
15 pages
A Benchmark of Batch-Effect Correction Methods For Single-Cell RNA Sequencing Data
No ratings yet
A Benchmark of Batch-Effect Correction Methods For Single-Cell RNA Sequencing Data
32 pages
Understanding Functional Genomics
No ratings yet
Understanding Functional Genomics
11 pages
Functional Genomics Internship Report
No ratings yet
Functional Genomics Internship Report
17 pages
edgeRUsersGuide PDF
No ratings yet
edgeRUsersGuide PDF
110 pages
Bibm11 b223 Slides
No ratings yet
Bibm11 b223 Slides
49 pages
Package Design Engineer Qualification Pack
No ratings yet
Package Design Engineer Qualification Pack
45 pages
The Awakening: Self-Discovery Guide
No ratings yet
The Awakening: Self-Discovery Guide
487 pages
Python ANPR System with OpenCV & EasyOCR
No ratings yet
Python ANPR System with OpenCV & EasyOCR
5 pages
ACI Concrete Mix Design Guidelines
No ratings yet
ACI Concrete Mix Design Guidelines
4 pages
High-Speed Coreless Axial Flux Generator
No ratings yet
High-Speed Coreless Axial Flux Generator
9 pages
The Isle of The Plangent Mage v1-2
100% (7)
The Isle of The Plangent Mage v1-2
56 pages
Medium Air Compressors 450-600 CFM
No ratings yet
Medium Air Compressors 450-600 CFM
4 pages
Waterfall: Undertale Guitar Score
No ratings yet
Waterfall: Undertale Guitar Score
9 pages
GEAW System for Functional Occlusion
No ratings yet
GEAW System for Functional Occlusion
9 pages
Differences Between ASK, FSK, and PSK
No ratings yet
Differences Between ASK, FSK, and PSK
42 pages
Concrete Beam Design Calculations
No ratings yet
Concrete Beam Design Calculations
2 pages
Udupi District Groundwater Overview
No ratings yet
Udupi District Groundwater Overview
29 pages
ACE Engineering: Lean Production Strategies
No ratings yet
ACE Engineering: Lean Production Strategies
3 pages
Multi N/C Pharma HT Maintenance Guide
No ratings yet
Multi N/C Pharma HT Maintenance Guide
14 pages
The Ten Scrolls of Sales Success
No ratings yet
The Ten Scrolls of Sales Success
20 pages
Holoprosencephaly and Eye Development
No ratings yet
Holoprosencephaly and Eye Development
2 pages
Understanding Magnetism and Magnetic Materials
No ratings yet
Understanding Magnetism and Magnetic Materials
30 pages
IGCSE Paper 1 Exam Overview and Tips
No ratings yet
IGCSE Paper 1 Exam Overview and Tips
84 pages
Herring 1950
No ratings yet
Herring 1950
10 pages
Tesco's Business Environment Analysis
0% (1)
Tesco's Business Environment Analysis
14 pages
Loss of HMS Hood Part 3 PDF
No ratings yet
Loss of HMS Hood Part 3 PDF
10 pages
Year 10 Chemistry: Atomic Structure and Bonding
No ratings yet
Year 10 Chemistry: Atomic Structure and Bonding
57 pages
African Perspectives on Personal Identity
No ratings yet
African Perspectives on Personal Identity
5 pages
prEN 1993-6
No ratings yet
prEN 1993-6
75 pages
Friedel-Crafts Acylation in Benzene Reactions
100% (1)
Friedel-Crafts Acylation in Benzene Reactions
3 pages
WPC Troubleshooting Guidelines
No ratings yet
WPC Troubleshooting Guidelines
13 pages
BTech Analog Circuits Exam Questions
No ratings yet
BTech Analog Circuits Exam Questions
2 pages
Poultry Farming in Tropical Regions
No ratings yet
Poultry Farming in Tropical Regions
6 pages
Tunnel Wiring Practical Exam Guide
No ratings yet
Tunnel Wiring Practical Exam Guide
20 pages
History of Science in Ancient Egypt
No ratings yet
History of Science in Ancient Egypt
5 pages

Evaluating DoRothEA in scRNA-seq Analysis

Uploaded by

Evaluating DoRothEA in scRNA-seq Analysis

Uploaded by

Holland et al.

Genome Biology (2020) 21:36

RESEARCH Open Access

Robustness and applicability of

Background molecular processes such as the activity of pathways or

Supplementary information Author details

You might also like