Coxkan: Kolmogorov-Arnold Networks For Interpretable, High-Performance Survival Analysis
Coxkan: Kolmogorov-Arnold Networks For Interpretable, High-Performance Survival Analysis
Abstract
Survival analysis is a branch of statistics used for modeling the time until a
specific event occurs and is widely used in medicine, engineering, finance, and
many other fields. When choosing survival models, there is typically a trade-
off between performance and interpretability, where the highest performance is
achieved by black-box models based on deep learning. This is a major problem in
fields such as medicine where practitioners are reluctant to blindly trust black-
box models to make important patient decisions. Kolmogorov-Arnold Networks
(KANs) were recently proposed as an interpretable and accurate alternative to
multi-layer perceptrons (MLPs). We introduce CoxKAN, a Cox proportional
hazards Kolmogorov-Arnold Network for interpretable, high-performance sur-
vival analysis. We evaluate the proposed CoxKAN on 4 synthetic datasets and
9 real medical datasets. The synthetic experiments demonstrate that CoxKAN
accurately recovers interpretable symbolic formulae for the hazard function, and
effectively performs automatic feature selection. Evaluation on the 9 real datasets
show that CoxKAN consistently outperforms the Cox proportional hazards model
and achieves performance that is superior or comparable to that of tuned MLPs.
Furthermore, we find that CoxKAN identifies complex interactions between pre-
dictor variables that would be extremely difficult to recognise using existing
survival methods, and automatically finds symbolic formulae which uncover the
precise effect of important biomarkers on patient risk.
1
1 Introduction
Survival analysis - also called time-to-event analysis - is a set of statistical methods
used for modelling the time until a specific event occurs, such as death, failure, or
relapse. It is crucial to various fields, including medicine, engineering, economics, and
insurance, where understanding the timing and probability of events can significantly
impact decision-making. For example, survival models are used extensively in oncology
(the study of cancer) to identify biomarkers/prognostic factors [1–3], assess treatment
efficacy [4–7], and develop personalized treatment plans [8].
Arguably, the most common survival model is the Cox proportional hazards model
(CoxPH) [9], which assumes a linear relationship between the patient’s 1 covariates
(e.g., age, blood pressure etc.) and the log-partial hazard, which is a measure of the
patient’s risk of event-occurrence (see Section 2.1.1). This model has the benefit of
interpretability (we can see exactly how each covariate impacts risk), but the linear
assumption is often overly simplistic and can cause significant bias error. Methods
based on machine learning generally have less bias and therefore potentially better
performance. These include models such as random survival forests [10, 11], Bayesian
models based on Gaussian processes [12, 13] and dependant logistic regression [14].
The most powerful survival models are those based on deep learning, which was first
shown with “DeepSurv” [8], a deep neural network based on CoxPH. Deep learning
models also have the advantage of being able to handle diverse input modalities—from
unstructured data such as images to structured datasets like tabular health records—,
making them highly adaptable for multiple healthcare applications. Deep learning has
been used extensively for survival analysis, achieving state-of-the-art performance on
numerous datasets across many domains [15–21]. However, the increased complexity
associated with deep learning comes at the expense of interpretability, with multi-
layer perceptrons (MLPs) being sometimes referred to as a “black-box”. As a result,
these methods have had limited clinical adoption and the search for more interpretable
techniques is an active area of research [22–24].
Kolmogorov-Arnold Networks (KANs) [25] were recently introduced as an alterna-
tive to MLPs, demonstrating enhanced interpretability and accuracy. This approach
differs from MLPs by using learnable activation functions on edges of the network
instead of linear weights, and summing those activation functions on nodes (“neu-
rons”). These learnable activation functions are parameterised as a B-spline curve
with learnable coefficients (see Section 2.2.1) to allow them to approximate any uni-
variate function. The interpretability of KANs stems from the ability to fit symbolic
operators to the learned activation functions, leaving a symbolic formula in-place of
the network. In the original paper, KANs were shown to be useful in physics for
solving partial differential equations and extracting mobility edges in the context of
Anderson localization. Since then, extensive applications of KANs have been found,
including time series analysis [26, 27], medical image segmentation [28] and satellite
image classification [29].
1
We adopt medical terminology when discussing survival data (eg. “patient”), but we emphasise that the
methods introduced in this paper are general and can be applied to survival analysis in any domain
2
In this work we introduce CoxKAN 2 , the first KAN-based framework
for interpretable survival analysis. CoxKAN uses a fast approximation to the Cox
loss to address KANs slow training time; pruning of activation functions to enable
automatic feature selection; and symbolic regression with PySR [30] to better control
an unconventional type of bias-variance tradeoff when finding symbolic representations
of KANs. The key contributions of this paper are in demonstrating that (a) CoxKAN
finds interpretable symbolic formulas for the hazard function, (b) CoxKAN identifies
biomarkers and complex variable interactions, and (c) CoxKAN achieves performance
that is superior to CoxPH and consistent with or better than DeepSurv (the equivalent
MLP-based model).
The paper is organised as follows: In Section 2.1 we describe the theory of survival
analysis. In Section 2.2 we explain the theory and implementation of Kolmogorov-
Arnold Networks. In Section 3 we describe the CoxKAN framework and training
pipeline. In Section 4, we present the experimental results from 3 categories of exper-
iments (synthetic data, clinical data, high dimensional genomics data). Finally, we
conclude in Section 5 by discussing the key takeaways and potential impact.
2 Preliminaries
2.1 Survival Analysis
Survival time is typically described using the survival function and the hazard function.
Let T be the time until the event of interest occurs, with probability density function
f (t). The survival function S(t) = P (T ≥ t) is the probability that a patient survives
longer than time t. The hazard function h(t) is the instantaneous event probability
density at time t, given the patient has survived up to at least that time. Formally, it
is written
P (t ≤ T < t + ∆t|T ≥ t)
h(t) = lim . (1)
∆t→∞ ∆t
This gives us the probability density function as f (t) = h(t)S(t). It can be shown that
the survival function is related to the hazard function by:
Z t
S(t) = exp(− h(s)ds). (2)
0
Survival data for a given patient is comprised of three parts: i) covariates x (predic-
tor variables), ii) time duration t, and iii) event indicator δ. If the event was observed
then δ = 1 and t is the time between the covariates being collected and the event
occurring. If the event was not observed then the patient is said to be right-censored,
δ = 0, and t is the time between the covariates being collected and the last contact
with the patient. For example, this could happen if we are conducting a study on the
survival of cancer patients, and some of the patients drop out of the study at random
2
Codes are available at https://github.com/knottwill/CoxKAN and can be installed using the following
command: pip install coxkan.
3
times. In standard regression methodology, the censored data would be discarded,
which can cause bias in the model. Hence, we have special methodology that makes
use of the censored data.
where the risk-set R(ti ) is the set of patients with observed time t ≥ ti (ie. those
who are alive as of time ti ).
2.1.2 DeepSurv
We can construct a proportional hazards model based on deep learning by using a
neural network to predict the log-partial hazard [8, 31]. It is trained using the “Cox
loss” function, which is the negative log of (5):
X X
ℓCox = − θ̂(xi ) − log exp(θ̂(xj )) . (6)
i:δi =1 j∈R(ti )
4
is computed from the previous one. The shape of a KAN is defined by [n0 , n1 , ..., nL ],
where nl is the number of neurons in the lth layer:
where x0 is the input to the network and xL is the output. The departure from MLPs
comes in that between the lth and (l + 1)th layer of the network, there are nl nl+1
learnable activation functions parameterised using B-splines, allowing them to capture
arbitrary functions (detailed below). The activation function that connects the ith
neuron in the lth layer to the j th neuron in the (l + 1)th layer is denoted ϕl,j,i . The
(l + 1)th layer is then computed as the sum of all incoming post-activations:
nl
X
xl+1,j = ϕl,j,i (xl,i ), j = 1, · · · , nl+1 . (8)
i=1
where Φl is a matrix of the univariate functions. The output of the network given an
input vector x ∈ Rn0 can be written as
All operations are differentiable, allowing KANs to be trained via back propagation.
Similarly to MLPs, KANs possess the property of universality such that a sufficiently
large KAN with at least one hidden layer can approximate any smooth function on a
compact domain with arbitrary accuracy. An intuitive visualization of a KAN can be
found in Fig. 1
where wb , ws are trainable weights that control the magnitude of the activation,
b(x) is a (non-trainable) basis function used for training stability (analogous to a
5
Fig. 1 Visualization of Kolmogorov-Arnold Network with shape [2,2,1] - the nodes are connected
by learnable activation functions.
G+k−1
X
spline(x) = ci Bi,k (x), (12)
i=0
where the ci ’s are trainable parameters and the Bi,k ’s are B-spline basis functions of
degree k on G grid intervals. For sufficiently high k and G, spline(x) can approximate
any smooth 1D function defined on a bounded domain with arbitrary accuracy. The
Bi,k ’s are only non-zero on finite overlapping intervals, hence B-splines provide local
control over the shape of the function (we can modify part of the function without
affecting the rest). In this work, we only consider k = 3, G ∈ {3, 4, 5} and b(x) = x or
b(x) = silu(x) = x/(1 + e−x ).
2.2.2 Regularization
For efficiency and interpretability, we would ideally like our KAN to be as small and
simple as possible. However, we may not know in advance the appropriate shape for
the problem. Hence, [25] proposed a regularization and pruning scheme to simplify a
KAN from an initially large network. First, regularization terms are added to the loss
function to encourage sparsity of the KAN neurons and spline coefficients.
The L1 norm of an activation function ϕ is defined to be its average magnitude
over the training batch of NB inputs,
NB
1 X
|ϕ|1 ≡ ϕ(x(s) ) , (13)
NB s=1
1
PG+k−1
and that of its spline coefficients c is |c|1 = G+K i=0 |ci |.
6
Then, the L1 norm of a full KAN layer Φ with nin inputs and nout outputs, is
given by the sum of the L1 norms of the individual activations:
nin n
X X out
Pnout for the layer’s collective set of spline coefficients C, we have |C|1 ≡
PnSimilarly,
in
i=1 j=1 |ci,j |1 .
The entropy of Φ is defined to be
nin nout
|ϕi,j |1 |ϕi,j |1
X X
S(Φ) ≡ − log . (15)
i=1 j=1
|Φ|1 |Φ|1
3 CoxKAN
CoxKAN is a novel proportional hazards model where the log-partial hazard is
estimated by a KAN with a single output node:
7
to train compared to MLPs. CoxKAN is optimized using Adam [32], taking steps on
the whole training set (as opposed to batches) for training stability.
Hyperparameter Tuning
We implement random hyperparameter optimization [33] with the Python package
Optuna [34] using the Tree-structured Parzen Estimator [35] algorithm to efficiently
search the hyperparameter space. The objective function we optimize on is the average
C-Index of the pruned CoxKAN over a 4-fold cross-validation of the experiment’s
training set.
Early Stopping
We conduct early stopping based on validation set C-Index. For smaller datasets we
may want to train on the full training set instead of reserving an extra validation set,
hence we include early stopping as an optimizable hyperparameter. For datasets where
early stopping is determined to be most optimal, we reserve 20% of the designated
training set as the validation set.
Pruning
After training CoxKAN, we prune activation functions in the network by removing
those that have L1 norms below a certain threshold. This allows for automatic feature
selection and control of the network shape. The L1 threshold is a tunable hyperparam-
eter, but when using a validation set for early stopping, we instead select the optimal
threshold based on validation performance.
Symbolic Fitting
For interpretability, we would like the activation functions of CoxKAN to be clean
symbolic formulas rather than parameterised B-spline curves.
Reference [25] proposed the following procedure to convert a KAN to a symbolic
representation: If we suspect a given activation function ϕ(x) is approximating a known
symbolic operator f (e.g., sin or exp), then we can set the activation function to
ϕ(x) = cf (ax + b) + d. The affine parameters (a, b, c, d) are found by fitting them
to a set of pre- and post-activations {x(s) , y (s) }M
s=1 , such that y ≈ cf (ax + b) + d.
This is done by iterative grid search for (a, b) and linear regression for (c, d). The
quality of the fit is measured by the coefficient of determination, R2 (AKA “fraction
of variance explained”). We can either visualize the activations by eye and choose a
suitable function to fit, or we can use pykan’s auto symbolic method, which simply
fits all symbolic operators from a large library and selects the operator that achieves
the highest R2 .
In this work, we used auto symbolic with a library of 22 functions (see Appendix
A), with a few additional improvements. Firstly, several of these functions can become
linear with the right choice of affine parameters, but if a learnt activation is linear
then we want this to be reflected in the symbolic formula. Hence, after training and
pruning CoxKAN, we first fit the linear function f (x) = ax + b (special case with
two affine parameters instead of four) to all activation functions and accept the fit if
R2 > 0.99, otherwise we proceed normally (auto symbolic or recognition by eye).
8
Secondly, certain activation functions may be so complex that (a) we cannot recog-
nise its symbolic form by eye and (b) no operators in our library fit sufficiently well.
In this case, the procedure described above fails. Instead, CoxKAN has the ability
to find a symbolic form for the activation function by using a genetic algorithm to
perform symbolic regression on the pre- and post-activations {x(s) , y (s) }M
s=1 , searching
a much wider space of symbolic functions. The process, known as symbolic regres-
sion, is based on the Python package PySR [30]. To favour simple symbolic operators,
CoxKAN does not use PySR by default.
4 Results
To evaluate CoxKAN as comprehensively as possible we conducted experiments on
both synthetic and real datasets (13 in total). For each experiment, we train CoxKAN
using the procedure described in Section 3 (hyperparameter search → train with spar-
sity regularization → auto-prune → fit symbolic). The hyperparameters found in each
case are detailed in Appendix A.
On the real datasets we compare CoxKAN to CoxPH and DeepSurv. To ensure
a fair comparison, we use the same hyperparameter searching strategy for DeepSurv
as used for CoxKAN, and we boost performance of DeepSurv as much as possible
by enabling modern deep learning techniques such as early stopping, dropout, batch
normalization and weight decay (L2 regularization).
We evaluate all models using the concordance index c (C-Index) [36], which is the
most common metric to judge predictive accuracy of a survival model. It measures
how well the model predictions agree with the ranking of the patient’s survival times,
where c = 0.5 corresponds to random raking and c = 1 is a perfect ranking. We
obtain 95% confidence intervals by bootstrapping [37] the test set (sampling with
replacement), and characterise the difference in performance between two models as
statistically significant if the confidence intervals do not overlap.
9
Fig. 2 Visualization of the CoxKAN pruning and symbolic fitting pipeline for the synthetic dataset
generated using a Gaussian log-partial hazard.
Often the symbolic formula predicted by CoxKAN contains both terms where
covariates contribute to the log-partial hazard in isolation (without interacting) and
terms that involve the interaction between covariates. We refer to the former as
“isolation terms” and the latter as “interaction terms”.
3
Note that since survival time is a random variable, the true formula does not achieve c = 1. In fact,
CoxKAN Symbolic actually achieves a slightly higher C-Index than the true formula on the Gaussian
dataset. This is a result of variance and does not suggest that CoxKAN can be “better” than the ground-
truth. In the limit of an infinite dataset, achieving a higher C-Index than the true formula is impossible.
10
Table 1 Synthetic Datasets: C-Index (95% Confidence Interval)
Fig. 3 Visualizations of CoxKAN trained on synthetic datasets, after pruning but before symbolic
fitting. The ϵ’s represent the irrelevant features added each dataset (successfully pruned in all cases).
11
4.1.2 Shallow Formula
It is common in survival data to encounter covariates which satisfy the linear CoxPH
assumptions after some non-linear transformation. That is, they have non-linear
relationships to the patient’s risk but they do not interact with each other.
To determine whether CoxKAN can automatically detect and solve this situation,
we set the log-partial hazard to
By multiplying this out and making some liberal approximations to the affine
parameters, we recover the original formula:
r
1 2
θ̂KAN ≈ 4 (x − 2x1 x2 + x22 + x23 − 2x3 x4 + x24 )
2 1
p
= 2 (x1 − x2 )2 + (x3 − x4 )2 .
where x1 ∈ [0.1, 1], x2 ∈ [−1, 1]. The intuition here is that tanh(5z) has a shallow
gradient in most of its input domain, hence large portions of the input space have
similar survival functions, which should cause the data to have a weak training signal.
12
Fig. 4 Log-partial hazard surfaces of the (a) true formula θ(x) = tanh(5(log(x1 ) + |x2 |)) and the (b)
‘incorrect’ CoxKAN-predicted formula. They are very similar, demonstrating a strong approximation
in the relevant domain.
Table 1 tells us that, despite being the ‘wrong’ formula, there is no statistically
significant difference between the C-Index of θ̂KAN and the true log-partial haz-
ard. In Fig. 4 we visualise the true and predicted log-partial hazards, which look
extremely similar. This tells us that in the domain of interest, the predicted for-
mula is a very strong approximation to the true formula. We argue that CoxKAN
still exhibits high-performance and interpretability in this case, since the effect of the
covariates on the hazard is still accurately captured in a symbolic form.
13
Table 2 Clinical datasets: C-Index (95% Confidence Interval). Highest C-Index in bold.
overlap but it occurred in on 4 of 5 datasets and it is intuitive that the pruning and
symbolic fitting pipeline would reduce variance error. Pruning removes irrelevant noisy
features and makes the network smaller, and symbolic fitting smooths out the activa-
tions; thus, this pipeline provides an inductive bias towards simpler functions
that are more likely to generalize well. The rest of this section analyses the results of
each experiment in more depth.
14
Table 3 Summary of CoxKAN-extracted interaction between age and cancer status in the SUPPORT
dataset. We verify the interaction by splitting the patients into the relevant subgroups and fitting
CoxPH to the age column.
given patient. In Fig. 5(c), we re-plot ϕinteract with colour indicating the patient’s
cancer status (top) and age (bottom). We observe that:
• Patients with non-metastatic cancer are in high risk and the risk initially decreases
with age (until approximately 60 years old) and then increases.
• Patients without cancer are in lower risk, but their risk sharply increases with age.
• Patients with metastatic cancer are in the highest risk and their risk increases
non-linearly with age.
Obtaining insights like this using existing survival methods would require cum-
bersome work involving the stratification of patients into subgroups and searching for
trends. This result demonstrates the power of CoxKAN to automatically extract
complex insights from survival data.
To verify this interaction as a true property of the dataset, we split the patients
into 4 relevant subgroups and fitted CoxPH on the age column. The interaction and
corresponding verification are summarised in Table 3.
Fig. 5 Visualization of how CoxKAN extracts a meaningful interaction between two covariates in
the SUPPORT dataset. (a) Full network where activation functions involved in the interaction are
non-symbolic (shown in black), and all other activation functions are symbolic (shown in red). (b)
Sub-network that encodes the interaction between patient age and cancer status. Each data point in
each activation function represents the value of that activation function for a given patient. (c) Top:
ϕinteract where colour indicates cancer status, Bottom: ϕinteract where colour indicates age.
15
Fig. 6 Some of the non-linear symbolic terms in the CoxKAN-predicted hazard for the SUPPORT
dataset. Each data-point represents a patient.
where we plot some of the non-linear isolation terms in Fig. 6 for clarity. The isolated
age and cancer status terms can be neglected since most of the effect from these
covariates comes from ϕinteract .
We have just seen that we can leave ϕinteract as the original B-spline curve and
achieve strong interpretability purely by visualization. However, if we still desire a
symbolic form then we can use PySR [30] to find an accurate representation. In Fig. 7
we plot the symbolic fits by using the default auto symbolic method vs using PySR.
We see that auto symbolic causes the loss of important information, whereas PySR
retains essentially all information with the following expression:
16
Fig. 7 Symbolic fitting to the complex activation function ϕinteract using pykan’s auto symbolic
method (left) vs PySR (right). This is an example where auto symbolic fails to capture all important
information.
−0.07 if tumor size ≤ 20 mm
( )
−0.21 if hormonal therapy
θ̂KAN = + + 0.21 if 20 < tumor size < 50 mm
0.28 otherwise
0.48 if tumor size ≥ 50 mm
( )
−0.12 if pre-menopausal 2 2
+ + 1.8 (1 − 0.02 · age) − 1.2e−0.02(nodes+0.4)
0.23 if post-menopausal
+ 0.1 cosh (0.002 · PGR − 1.6) − 0.0007 · ER.
This formula is visualized within the structure of CoxKAN in Fig. 8(a). Interest-
ingly, we observe a single trough in the activation functions of age and concentration
of progesterone receptor (PGR), indicating a “sweet spot” for these covariates.
The CoxPH-predicted log-partial hazard is
which has similar trends to θ̂KAN (i.e., patient risk increases with tumor size, number
of lymph nodes and menopause, and decreases with hormonal therapy and ER con-
centration) but has worse performance, which can be attributed to bias error due to
the linear assumption.
17
Fig. 8 Visualizations of CoxKAN Symbolic for the following datasets: (a) GBSG, (b) METABRIC,
(c) FLCHAIN. Activations containing a “c” are functions of categorical covariates that were converted
to a discrete map.
ERBB2, MKI67 ) and 5 clinical features (age and indicators for hormone treatment,
radiotherapy, chemotherapy, estrogen receptor). There are 1,523 patients for training
and 381 for testing. CoxKAN predicts the log-partial hazard as
We visualize this formula in Fig. 8(b). These genes are among the most extensively
studied in breast cancer; increased PGR expression is associated with better prog-
nosis [43] and increased expression of ERBB2 and MKI67 is associated with poorer
prognosis and highly aggressive tumors [44]. These effects are re-discovered here using
CoxKAN, with precise symbolic formulas.
18
holds on this dataset. CoxKAN predicts:
( )
−0.047 if female
θ̂KAN = 0.09 · age + + 0.4 arctan(0.4 · year − 737) + 0.04 · FLCkappa
0.118 if male
+ 0.3 · FLClambda + 0.009 · FLCgroup + 2 arctan(0.5 · creatinine − 0.9),
θ̂CP H = 0.1 · age + 0.3 · sex + 0.06 · year + 0.01 · FLCkappa + 0.2 · FLClambda
+ 0.06 · FLCgroup + 0.03 · creatinine + 0.3 · mgus.
All trends are essentially the same, which validates that CoxKAN can handle
situations where the linear assumption is appropriate.
19
Fig. 9 Visualization of CoxKAN on the NWTCO dataset: (a) CoxKAN after symbolic fitting
(ϕ1,1,1 , ϕ1,1,2 , ϕ1,1,5 are all linear), (b) Interpretable visualizations of the interaction term ϕ1,1,3 -
Top: All patients where colour indicates central histology reading, Middle: Patients with unfavourable
histology where colour indicates age, Bottom: Patients with favourable histology where colour indi-
cates age.
and ϕ1,1,4 does not encode any particularly strong interactions between covariates and
has a smaller effect on the hazard than other terms, thus it is not worth discussing.
As we might expect, the isolation terms tell us that unfavourable histology and later
stage cancer are associated with poorer prognosis.
We plot the interaction term ϕ1,1,3 in Fig. 9(b), where colour indicates central
histology readings (top), age for patients with unfavourable histology (middle), and age
20
Table 4 Summary of CoxKAN-extracted interaction between age and histology in
the NWTCO dataset. We verify the interaction by splitting the patients into the
relevant subgroups and fitting CoxPH to the age column.
for patients with favourable histology (bottom). It turns out that the full interaction
comes from considering the composite term ϕ1,1,3 + 0.02 · age, and can be summarised
as follows:
• For patients with favourable histology, ϕ1,1,3 is not significant and +0.02 · age
dominates such that increasing age is good for prognosis.
• For patients with unfavourable histology, ϕ1,1,3 is a sharply decreasing function,
such that overall effect of increasing age on prognosis is negative (particularly for
younger ages).
We validate this interaction by splitting the cohort into subsets and fitting CoxPH
on the age column. The results are summarised in Table 4.
Datasets
In total, we curated four genomics datasets with diverse cancer types: Breast Invasive
Carcinoma (BRCA), Stomach Adenocarcinoma (STAD), Glioma (GBM/LGG), and
Kidney Renal Clear Cell Carcinoma (KIRC). To ensure a representative test set, we
21
divided each dataset into training (80%) and test (20%) sets by stratifying according
to the distribution of observed durations and event indicators. All datasets include
sparse mutation features and heavily skewed mRNA expression data. The GBM/LGG
and KIRC datasets, as preprocessed in [50], also exhibit significant multicollinearity
in the CNV features. For STAD and BRCA (preprocessed by us), we solved the mul-
ticollinearity issue in the CNV features, allowing us to evaluate the high-dimensional
datasets with and without multicollinearity. Specifically, the preprocessing pipeline of
STAD and BRCA is as follows: (1) Features were selected based on p-values derived
from univariate CoxPH analysis. (2) Groups of highly correlated CNV features were
consolidated by replacing them with a single feature representing the median value.
(3) Missing values were imputed using the random forest imputation method.
As a result, the BRCA dataset contains 811 training patients, 205 testing patients,
and has 168 features in total (73 CNVs, 91 RNAs, 4 Mutations). The STAD dataset
contains 284 training patients, 71 testing patients, and has 148 features (67 CNVs,
61 RNAs, 20 Mutations). The GBM/LGG dataset contains 400 training patients and
100 testing patients. There are 320 features in total, consisting of the mutation status
of the IDH1 gene, 240 RNAs and 79 CNVs (including the binary status of 1p19q
arm codeletion). Finally, the KIRC dataset contains 388 training patients, 97 testing
patients, and consists of 362 features (116 CNVs, 240 RNAs, and 6 Mutations).
Analysis
For STAD, BRCA, and GBM/LGG, the hyperparameter search of CoxKAN deter-
mined that using no hidden layers is most optimal. This is likely because a shallow
KAN has less capacity for overfitting, but also suggests that there may not be
significant interactions between the genomic features.
Similarly to the previous section, we compare the performance of CoxKAN to
CoxPH and DeepSurv. However, this data is so prone to overfitting that even CoxPH
can overfit (where usually it is assumed to suffer primarily from bias error alone). For
a fairer comparison (and to solve numerical issues due to multicollinearity), we also
evaluated CoxPH with heavy Lasso (L1) regularization (“CoxPH Lasso”).
The results are shown in Table 5. It is clear that CoxPH without regularization
either encounters numerical problems or is only slightly better than random guess-
ing. Introducing heavy Lasso regularization significantly improves the performance
of CoxPH, even outperforming DeepSurv to a statistically significant degree on the
STAD dataset. CoxKAN Symbolic demonstrates consistent and robust performance;
it is either competitive with or surpasses CoxPH Lasso and DeepSurv on all datasets.
We analyse the interpretable log-partial hazard formulas generated by CoxKAN on
the GBM/LGG and BRCA datasets, where CoxKAN Symbolic outperforms CoxPH
with Lasso regularization. For STAD and KIRC, CoxKAN Symbolic achieves com-
parable performance to CoxPH Lasso, please refer to Appendix C for these two
formulas.
Given the high dimensionality of features in these datasets, the log-partial hazard
formulas derived using CoxKAN become quite large. To simplify these formulas, we
estimate the relative importance of each term using its standard deviation σ over the
22
Table 5 Genomics datasets: C-Index (95% Confidence Interval). Highest C-Index in bold.
full dataset. Terms with higher standard deviations have a greater impact on the log-
partial hazard. For the derived CoxKAN formulas of each dataset, we only present
the terms with the high standard deviations, σ. One caveat is that certain terms have
extreme values for specific samples, inflating the standard deviation without signif-
icantly affecting corresponding rankings. To address this, we exclude outlier values
when calculating the standard deviation for each term.
23
Fig. 10 Visualization of the most significant non-linear terms in the CoxKAN-predicted hazard
for the GBM/LGG dataset. Data points represent test-set patients. For points that correspond to
multiple patients, the number of patients are indicated. Note that the x-axis shows the true measured
value of each feature, whereas the quoted equations are in terms of standardised features.
24
cell functions like division and apoptosis and its amplification is a strong indicator of
poor outcomes [60]. As shown in Fig. 10, the term-based contributions in the generated
hazard equation align with these findings, except for a few cases.
As for the remaining terms (CARD11 and JAK2 ), CARD11 CNVs have been
shown to implicate tumor progression in some cancer types like colorectal cancer [61],
and the JAK2 gene is crucial for hematopoietic and immune signaling. Frequent loss
of CDKN2A in tumors, including melanoma, often coincides with JAK2 deletion,
leading to IFNγ resistance, which is associated with resistance to immunotherapy [62].
The term-based contributions of CARD11 and JAK2 derived from CoxKAN show a
similar pattern to these studies. However, there are currently no studies indicating a
role for CARD11 and JAK2 in glioma progression. Our findings suggest that further
research is needed to understand their biological function in this context.
25
Fig. 11 Visualization of the most significant non-linear terms in the CoxKAN-predicted hazard for
the BRCA dataset. Data points represent test-set patients. For points that correspond to multiple
patients, the number of patients are indicated. Note that the x-axis shows the true measured value
of each feature, whereas the quoted equations are in terms of standardised features.
26
under-expression can influence patient risk. These CoxKAN results highlight the com-
plexity of these roles in breast cancer and emphasize the need for further research to
understand their biological implications better.
Key Findings
In the first series of experiments, we generated synthetic datasets using custom sym-
bolic formulas for the hazard function and found that in 3/4 examples CoxKAN was
able to recover the correct symbolic form. In the last example (which was made to be
intentionally difficult to recover), CoxKAN found a formula that was shown to be a
highly accurate approximation to the ground truth; we claim that CoxKAN still pos-
sesses the properties of interpretability and high performance in this case. Additionally,
CoxKAN automatically pruned the irrelevant, noisy features added to all synthetic
datasets, demonstrating successful feature selection. We then evaluated CoxKAN on
5 clinical datasets and 4 high-dimensional genomics datasets. On the clinical data,
CoxKAN Symbolic achieved a statistically significant improvement in performance
over CoxPH in 4/5 cases and over DeepSurv in 3/5 cases. On the genomics data,
CoxKAN Symbolic achieved a statistically significant performance improvement over
the DeepSurv in 2/4 cases and outperformed CoxPH with heavy Lasso regularization
twice (though only once was this statistically significant). On datasets that CoxKAN
did not outperform CoxPH or DeepSurv, the performance difference was generally not
statistically significant as characterised by overlapping confidence intervals. CoxKAN
also uncovered useful insights from the survival data. For example, on the SUPPORT
dataset, CoxKAN identified that the risk of cancer patients in metastasis decreases
with age until about 60 years old, then starts to increase, but for patients with non-
metastatic cancer or no cancer at all, their risk only increases with age. This kind
of variable interaction would be extremely difficult to identify using existing survival
models. On the genomics datasets, CoxKAN uncovered a number of important biolog-
ical associations between cancer risk and genomic features such as specific CNVs and
mRNA transcripts, offering valuable insights that can guide further biological studies
and the development of targeted therapeutic strategies.
27
Potiental Applications of CoxKAN
Given that CoxKAN is the essentially first survival model with sophisticated inter-
pretability and low bias, we believe it has far-ranging applications, both within the
medical field and in other disciplines. In medical research, CoxKAN could be used
to discover complex biomarkers involving multi-variable interactions and assess
treatment efficacy by providing insights of how treatment conditions impact sur-
vival and interact with patient features. In a clinical setting, CoxKAN could be used for
personalized medicine by using its predictions/insights to inform treatment plans.
Outside of the medical field, CoxKAN could be used to understand and address under-
lying factors that impact the time to mechanical failure in engineering (helping to
inform construction of equipment), customer churn in business (guiding the devel-
opment of retention strategies), loan default in finance (improving risk assessment
models) and insurance claims (allowing actuaries to justify premiums).
6 Data availability
The clinical datasets METABRIC, SUPPORT and GBSG are available at
https://github.com/jaredleekatzman/DeepSurv/tree/master/experiments/data
and NWTCO, FLCHAIN are available at https://vincentarelbundock.github.io/
Rdatasets/. TCGA genomic data (BRCA, STAD, GBM/LGG, and KIRC) are
available at https://portal.gdc.cancer.gov.
28
7 Code availability
The code for training and evaluating CoxKAN is available at
https://github.com/knottwill/CoxKAN, and can be installed using the following
command: “pip install coxkan”.
8 Acknowledgements
We acknowledge funding and support from Cancer Research UK and the Cancer
Research UK Cambridge Centre [CTRQQR-2021-100012], The Mark Foundation for
Cancer Research [RG95043], GE HealthCare, and the CRUK National Cancer Imaging
Translational Accelerator (NCITA) [A27066]. Additional support was also provided
by the National Institute of Health Research (NIHR) Cambridge Biomedical Research
Centre [NIHR203312] and EPSRC Tier-2 capital grant [EP/P020259/1]. The funders
had no role in study design, data collection and analysis, decision to publish, or prepa-
ration of the manuscript. Calculations were performed in part using the Sulis Tier 2
HPC platform hosted by the Scientific Computing Research Technology Platform at
the University of Warwick. Sulis is funded by EPSRC Grant EP/T022108/1 and the
HPC Midlands+ consortium.
References
[1] Koene, R.J., Prizment, A.E., Blaes, A., Konety, S.H.: Shared risk factors in
cardiovascular disease and cancer. Circulation 133, 1104–1114 (2016) https:
//doi.org/10.1161/CIRCULATIONAHA.115.020406
[2] Saegusa, T., Zhao, Z., Ke, H., et al.: Detecting survival-associated biomarkers
from heterogeneous populations. Scientific Reports 11(1), 3203 (2021) https://
doi.org/10.1038/s41598-021-82332-y
[3] Ou, F.S., Michiels, S., Shyr, Y., Adjei, A.A., Oberg, A.L.: Biomarker discovery
and validation: Statistical considerations. Journal of Thoracic Oncology 16(4),
537–545 (2021) https://doi.org/10.1016/j.jtho.2021.01.1616
[4] Mok, T.S., Wu, Y.-L., Thongprasert, S., Yang, C.-H., Chu, D.-T., Saijo,
N., Sunpaweravong, P., Han, B., Margono, B., Ichinose, Y., Nishi-
waki, Y., Ohe, Y., Yang, J.-J., Chewaskulyong, B., Jiang, H., Duffield,
E.L., Watkins, C.L., Armour, A.A., Fukuoka, M.: Gefitinib or carbo-
platin–paclitaxel in pulmonary adenocarcinoma. New England Journal of
Medicine 361(10), 947–957 (2009) https://doi.org/10.1056/NEJMoa0810699
https://www.nejm.org/doi/pdf/10.1056/NEJMoa0810699
[5] Le-Rademacher, J., Wang, X.: Time-to-event data: An overview and analysis
considerations. Journal of Thoracic Oncology 16(7), 1067–1074 (2021) https://
doi.org/10.1016/j.jtho.2021.04.004
29
[6] Monnickendam, G., Zhu, M., McKendrick, J., Su, Y.: Measuring survival benefit
in health technology assessment in the presence of nonproportional hazards. Value
in Health 22(4), 431–438 (2019) https://doi.org/10.1016/j.jval.2019.01.005
[7] Hurwitz, H., Fehrenbacher, L., Novotny, W., Cartwright, T., Hainsworth, J.,
Heim, W., Berlin, J., Baron, A., Griffing, S., Holmgren, E., Ferrara, N., Fyfe,
G., Rogers, B., Ross, R., Kabbinavar, F.: Bevacizumab plus irinotecan, fluo-
rouracil, and leucovorin for metastatic colorectal cancer. New England Journal
of Medicine 350(23), 2335–2342 (2004) https://doi.org/10.1056/NEJMoa032691
https://www.nejm.org/doi/pdf/10.1056/NEJMoa032691
[8] Katzman, J.L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., Kluger, Y.:
Deepsurv: personalized treatment recommender system using a cox proportional
hazards deep neural network. BMC Medical Research Methodology 18(1) (2018)
https://doi.org/10.1186/s12874-018-0482-1
[9] Cox, D.R.: Regression models and life-tables. Journal of the Royal Statistical
Society: Series B (Methodological) 34(2), 187–202 (1972)
[10] Ishwaran, H., Kogalur, U.B.: Random survival forests for r. R News 7(2), 25–31
(2007)
[11] Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival
forests. Ann. Appl. Statist. 2(3), 841–860 (2008)
[12] Fernandez, T., Rivera, N., Teh, Y.W.: Gaussian processes for survival analysis.
In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances
in Neural Information Processing Systems, vol. 29, pp. 5021–5029 (2016)
[13] Alaa, A.M., Schaar, M.: Deep multi-task gaussian processes for survival analy-
sis with competing risks. In: Proceedings of the 31st International Conference
on Neural Information Processing Systems. NIPS’17, pp. 2326–2334. Curran
Associates Inc., Red Hook, NY, USA (2017)
[14] Yu, C.-N., Greiner, R., Lin, H.-C., Baracos, V.: Learning patient-specific cancer
survival distributions as a sequence of dependent regressors. In: Shawe-Taylor, J.,
Zemel, R., Bartlett, P., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural
Information Processing Systems, vol. 24 (2011)
[15] Lee, C., Zame, W., Yoon, J., Schaar, M.: Deephit: A deep learning approach to
survival analysis with competing risks. Proceedings of the AAAI Conference on
Artificial Intelligence 32(1) (2018) https://doi.org/10.1609/aaai.v32i1.11842
[16] Ren, K., Qin, J., Zheng, L., Yang, Z., Zhang, W., Qiu, L., Yu, Y.: Deep Recurrent
Survival Analysis (2018). https://arxiv.org/abs/1809.02403
[17] Ching, T., Zhu, X., Garmire, L.X.: Cox-nnet: An artificial neural network method
30
for prognosis prediction of high-throughput omics data. PLOS Computational
Biology 14(4), 1006076 (2018) https://doi.org/10.1371/journal.pcbi.1006076
[18] Kvamme, H., Borgan, O.: Continuous and discrete-time survival prediction with
neural networks. Lifetime Data Analysis 27(4), 710–736 (2021) https://doi.org/
10.1007/s10985-021-09532-6
[19] Kvamme, H., Borgan, Scheel, I.: Time-to-event prediction with neural networks
and cox regression. Journal of Machine Learning Research 20(129), 1–30 (2019)
[20] Nagpal, C., Li, X., Dubrawski, A.: Deep survival machines : Fully parametric
survival regression and representation learning for censored data with competing
risks. IEEE Journal of Biomedical and Health Informatics PP, 1–1 (2021) https:
//doi.org/10.1109/JBHI.2021.3052441
[21] Nagpal, C., Yadlowsky, S., Rostamzadeh, N., Heller, K.: Deep cox mixtures for
survival regression. Machine Learning for Healthcare Conference (2021). PMLR
[22] Lu, S.C., Swisher, C.L., Chung, C., Jaffray, D., Sidey-Gibbons, C.: On the impor-
tance of interpretable machine learning predictions to inform clinical decision
making in oncology. Frontiers in Oncology 13, 1129380 (2023) https://doi.org/
10.3389/fonc.2023.1129380
[23] Langbein, S.H., Krzyziński, M., Spytek, M., Baniecki, H., Biecek, P., Wright,
M.N.: Interpretable Machine Learning for Survival Analysis (2024). https://arxiv.
org/abs/2403.10250
[24] Wiegrebe, S., Kopper, P., Sonabend, R., Bischl, B., Bender, A.: Deep learning
for survival analysis: a review. Artificial Intelligence Review 57(3) (2024) https:
//doi.org/10.1007/s10462-023-10681-3
[25] Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T.Y.,
Tegmark, M.: KAN: Kolmogorov-Arnold Networks (2024)
[26] Vaca-Rubio, C.J., Blanco, L., Pereira, R., Caus, M.: Kolmogorov-Arnold Net-
works (KANs) for Time Series Analysis (2024). https://arxiv.org/abs/2405.
08790
[27] Genet, R., Inzirillo, H.: A Temporal Kolmogorov-Arnold Transformer for Time
Series Forecasting (2024). https://arxiv.org/abs/2406.02486
[28] Li, C., Liu, X., Li, W., Wang, C., Liu, H., Yuan, Y.: U-KAN Makes Strong
Backbone for Medical Image Segmentation and Generation (2024). https://arxiv.
org/abs/2406.02918
31
[30] Cranmer, M.: Interpretable Machine Learning for Science with PySR and
SymbolicRegression.jl (2023). https://arxiv.org/abs/2305.01582
[31] Faraggi, D., Simon, R.: A neural network model for survival data. Statistics in
medicine 14(1), 73–82 (1995)
[32] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: International
Conference on Learning Representations (ICLR), San Diega, CA, USA (2015)
[33] Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. The
Journal of Machine Learning Research 13(1), 281–305 (2012)
[34] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-
generation hyperparameter optimization framework. In: Proceedings of the 25th
ACM SIGKDD International Conference on Knowledge Discovery & Data Min-
ing. KDD ’19, pp. 2623–2631. Association for Computing Machinery, New York,
NY, USA (2019). https://doi.org/10.1145/3292500.3330701
[36] Altman, D.G., Royston, P.: What do we mean by validating a prognostic model?
Statistics in medicine 19(4), 453–473 (2000)
[38] Knaus, W.A., Harrell, F.E., Lynn, J., Goldman, L., Phillips, R.S., Connors, A.F.,
Dawson, N.V., Fulkerson, W.J., Califf, R.M., Desbiens, N., et al.: The support
prognostic model: objective estimates of survival for seriously ill hospitalized
adults. Annals of internal medicine 122(3), 191–203 (1995)
[39] Foekens, J.A., Peters, H.A., Look, M.P., Portengen, H., Schmitt, M., Kramer,
M.D., Brünner, N., Jänicke, F., Meijer-van Gelder, M.E., Henzen-Logmans, S.C.,
et al.: The urokinase system of plasminogen activation and prognosis in 2780
breast cancer patients. Cancer research 60(3), 636–643 (2000)
[40] Schumacher, M., Bastert, G., Bojar, H., Huebner, K., Olschewski, M., Sauerbrei,
W., Schmoor, C., Beyerle, C., Neumann, R., Rauschecker, H.: Randomized 2 x 2
trial evaluating hormonal treatment and the duration of chemotherapy in node-
positive breast cancer patients. german breast cancer study group. Journal of
Clinical Oncology 12(10), 2086–2093 (1994)
[41] Royston, P., Altman, D.G.: External validation of a cox prognostic model: prin-
ciples and methods. BMC Medical Research Methodology 13(1), 33 (2013)
https://doi.org/10.1186/1471-2288-13-33
32
[42] Curtis, C., Shah, S.P., Chin, S.-F., Turashvili, G., Rueda, O.M., Dunning, M.J.,
Speed, D., Lynch, A.G., Samarajiwa, S., Yuan, Y., et al.: The genomic and tran-
scriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature
486(7403), 346–352 (2012)
[43] Kurozumi, S., Matsumoto, H., Hayashi, Y., et al.: Power of pgr expression as
a prognostic factor for er-positive/her2-negative breast cancer patients at inter-
mediate risk classified by the ki67 labeling index. BMC Cancer 17, 354 (2017)
https://doi.org/10.1186/s12885-017-3331-4
[44] Cheang, M.C.U., Chia, S.K., Voduc, D., Gao, D., Leung, S., Snider, J.,
Watson, M., Davies, S., Bernard, P.S., Parker, J.S., Perou, C.M., Ellis,
M.J., Nielsen, T.O.: Ki67 Index, HER2 Status, and Prognosis of Patients
With Luminal B Breast Cancer. JNCI: Journal of the National Can-
cer Institute 101(10), 736–750 (2009) https://doi.org/10.1093/jnci/djp082
https://academic.oup.com/jnci/article-pdf/101/10/736/18074850/djp082.pdf
[46] Kyle, R.A., Therneau, T.M., Rajkumar, S.V., Larson, D.R., Plevak, M.F.,
Offord, J.R., Dispenzieri, A., Katzmann, J.A., Melton, L.J.: Prevalence of mon-
oclonal gammopathy of undetermined significance. New England Journal of
Medicine 354(13), 1362–1369 (2006) https://doi.org/10.1056/NEJMoa054494
https://www.nejm.org/doi/pdf/10.1056/NEJMoa054494
[47] Dispenzieri, A., Katzmann, J.A., Kyle, R.A., Larson, D.R., Therneau, T.M.,
Colby, C.L., Clark, R.J., Mead, G.P., Kumar, S., Melton, L.J. 3rd, Rajkumar,
S.V.: Use of nonclonal serum immunoglobulin free light chains to predict overall
survival in the general population. Mayo Clin Proc 87(6), 517–523 (2012)
[48] Green, D.M., Thomas, P.R.M., Shochat, S.: The treatment of wilms tumor:
Results of the national wilms tumor studies. Hematology/Oncology Clinics of
North America 9(6), 1267–1274 (1995) https://doi.org/10.1016/S0889-8588(18)
30044-3 . Wilms Tumor
[49] Breslow, N.E., Chatterjee, N.: Design and analysis of two-phase studies with
binary outcome applied to wilms tumour prognosis. Journal of the Royal Sta-
tistical Society. Series C (Applied Statistics) 48(4), 457–468 (1999). Accessed
2024-06-03
[50] Chen, R.J., Lu, M.Y., Wang, J., Williamson, D.F.K., Rodig, S.J., Lindeman, N.I.,
Mahmood, F.: Pathomic fusion: An integrated framework for fusing histopathol-
ogy and genomic features for cancer diagnosis and prognosis. IEEE Transactions
on Medical Imaging 41(4), 757–770 (2022) https://doi.org/10.1109/TMI.2020.
3021387
33
[51] Ostrom, Q.T., Bauchet, L., Davis, F.G., Deltour, I., Fisher, J.L., Langer, C.E.,
Pekmezci, M., Schwartzbaum, J.A., Turner, M.C., Walsh, K.M., et al.: The epi-
demiology of glioma in adults: a “state of the science” review. Neuro-oncology
16(7), 896–913 (2014)
[52] Jenkins, R.B., Blair, H., Ballman, K.V., Giannini, C., Arusell, R.M., Law, M.,
Flynn, H., Passe, S., Felten, S., Brown, P.D., et al.: A t (1; 19)(q10; p10) mediates
the combined deletions of 1p and 19q and predicts a better prognosis of patients
with oligodendroglioma. Cancer research 66(20), 9852–9861 (2006)
[54] Komori, T.: Grading of adult diffuse gliomas according to the 2021 who classifi-
cation of tumors of the central nervous system. Laboratory Investigation 102(2),
126–133 (2022)
[55] Reis, G.F., Pekmezci, M., Hansen, H.M., Rice, T., Marshall, R.E., Molinaro,
A.M., Phillips, J.J., Vogel, H., Wiencke, J.K., Wrensch, M.R., et al.: Cdkn2a loss
is associated with shortened overall survival in lower-grade (world health orga-
nization grades ii–iii) astrocytomas. Journal of Neuropathology & Experimental
Neurology 74(5), 442–452 (2015)
[56] Li, K.K.-W., Shi, Z.-f., Malta, T.M., Chan, A.K.-Y., Cheng, S., Kwan, J.S.H.,
Yang, R.R., Poon, W.S., Mao, Y., Noushmehr, H., et al.: Identification of subsets
of idh-mutant glioblastomas with distinct epigenetic and copy number alterations
and stratified clinical risks. Neuro-Oncology Advances 1(1), 015 (2019)
[57] Stichel, D., Ebrahimi, A., Reuss, D., Schrimpf, D., Ono, T., Shirahata, M.,
Reifenberger, G., Weller, M., Hänggi, D., Wick, W., et al.: Distribution of egfr
amplification, combined chromosome 7 gain and chromosome 10 loss, and tert
promoter mutation in brain tumors and their potential for the reclassification of
idh wt astrocytoma to glioblastoma. Acta neuropathologica 136, 793–803 (2018)
[58] Wemmert, S., Ketter, R., Rahnenfuhrer, J., Beerenwinkel, N., Strowitzki, M.,
Feiden, W., Hartmann, C., Lengauer, T., Stockhammer, F., Zang, K.D., et al.:
Patients with high-grade gliomas harboring deletions of chromosomes 9p and 10q
benefit from temozolomide treatment. Neoplasia 7(10), 883–893 (2005)
[59] Ni, X., Wu, W., Sun, X., Ma, J., Yu, Z., He, X., Cheng, J., Xu, P., Liu, H., Shang,
T., et al.: Interrogating glioma-m2 macrophage interactions identifies gal-9/tim-3
as a viable target against pten-null glioblastoma. Science Advances 8(27), 5165
(2022)
[60] Zhao, H.-f., Zhou, X.-m., Wang, J., Chen, F.-f., Wu, C.-p., Diao, P.-y., Cai, L.-
r., Chen, L., Xu, Y.-w., Liu, J., et al.: Identification of prognostic values defined
by copy number variation, mrna and protein expression of lancl2 and egfr in
34
glioblastoma patients. Journal of Translational Medicine 19, 1–15 (2021)
[61] Mamlouk, S., Childs, L.H., Aust, D., Heim, D., Melching, F., Oliveira, C., Wolf,
T., Durek, P., Schumacher, D., Bläker, H., et al.: Dna copy number changes define
spatial patterns of heterogeneity in colorectal cancer. Nature communications
8(1), 14093 (2017)
[62] Horn, S., Leonardelli, S., Sucker, A., Schadendorf, D., Griewank, K.G., Paschen,
A.: Tumor cdkn2a-associated jak2 loss and susceptibility to immunotherapy
resistance. JNCI: Journal of the National Cancer Institute 110(6), 677–681
(2018)
[63] Tinsley, E., Bredin, P., Toomey, S., Hennessy, B.T., Furney, S.J.: Kmt2c and
kmt2d aberrations in breast cancer. Trends in Cancer (2024)
[64] Luce, L.N., Abbate, M., Cotignola, J., Giliberto, F.: Non-myogenic tumors display
altered expression of dystrophin (dmd) and a high frequency of genetic alterations.
Oncotarget 8(1), 145 (2017)
[65] Agarwal, S., Parija, M., Naik, S., Kumari, P., Mishra, S.K., Adhya, A.K., Kashaw,
S.K., Dixit, A.: Dysregulated gene subnetworks in breast invasive carcinoma
reveal novel tumor suppressor genes. Scientific Reports 14(1), 15691 (2024)
[66] Xu, Z., Xiang, L., Wang, R., Xiong, Y., Zhou, H., Gu, H., Wang, J., Peng, L.:
Bioinformatic analysis of immune significance of ryr2 mutation in breast cancer.
BioMed Research International 2021(1), 8072796 (2021)
[67] Liu, Z., Liu, L., Jiao, D., Guo, C., Wang, L., Li, Z., Sun, Z., Zhao, Y., Han,
X.: Association of ryr2 mutation with tumor mutation burden, prognosis, and
antitumor immunity in patients with esophageal adenocarcinoma. Frontiers in
genetics 12, 669694 (2021)
[68] Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: Genbank.
Nucleic acids research 44(D1), 67–72 (2016)
35
Appendix
A Hyperparameters
Table 7 shows the CoxKAN hyperparameters found for each experiment. The mean-
ing of each hyperparameter is described in Section 2.2, except for the initialization
hyperparameters:
• Scale weights (equation 11) are initialized as ws = 1 and wb = n1 +
in
Uniform([−ξb , ξb ]), where ξb is the ”spline noise”.
• Spline coefficients (equation 12) initialized as ci ∼ N (0, ( ξGs )2 ), where ξs is ”base
noise”.
As mentioned, the default auto-symbolic fitting to CoxKAN acti-
vation functions uses a library of 22 symbolic operators. These are
2
{sin(x), tan(x),
√ arctan(x), cosh(x), ex , e−x log(x), tanh(x), arctanh(x), sigmoid(x),
sgn(x), |x|, x, √1x , x, x2 , x3 , x4 , x1 , x12 , x14 }
Table 6 shows the DeepSurv hyperparameters found for each experiment. For the
synthetic datasets we did not evaluate DeepSurv and for SUPPORT, GBSG, and
METABRIC we quoted the results from the official DeepSurv publication.
36
Hyperparameter Gaussian Shallow Deep Difficult SUPPORT GBSG METABRIC FLCHAIN NWTCO TCGA-STAD TCGA-BRCA TCGA-GBM/LGG TCGA-KIRC
KAN Shape [4,2,1] [5,1] [6,5,5,1] [5,1] [14,3,1] [7,2,1] [9,1] [8,3,1] [6,5,1] [148,1] [168,1] [320,1] [362,4,4,1]
Learning Rate 0.035 0.01 0.01 0.1 0.015 0.0076 0.09 0.08 0.002 0.005 0.03 0.014 0.014
Early Stopping False False True False True True True True False True True True True
Steps 133 107 (300) 107 (300) (300) (300) (300) 147 (300) (300) (300) (300)
Prune threshold 0.03 0.03 0.045 0.03 0.00007 0.045 0.035 0.001 0.02 0.008 0.007 0.034 0.012
Grid Intervals 4 5 4 5 3 3 3 3 5 3 3 5 3
Base fn linear silu linear silu linear silu silu linear linear linear silu silu linear
37
Spline noise ξs 0.03 0.06 0.003 0.06 0.11 0.09 0.1 0.12 0.15 0.1 0.02 0.05 0.14
Base noise ξb 0.13 0.14 0.16 0.14 0.05 0.18 0.03 0.04 0.16 0.01 0.009 0.04 0.11
Reg λ 0.014 0.0001 0.01 0.0001 0.005 0.0007 0.003 0.006 0.002 0.0004 0.013 0.01 0.01
38