1 Introduction

Exchangeability is the fundamental assumption in machine learning. Traditional machine learning studies prediction under exchangeability (see, e.g., Vapnik, 1998), while newer methods consider deviations from exchangeability (see, e.g., Quiñonero-Candela et al., 2009). The role of exchangeability in conformal prediction, as subarea of machine learning, is briefly reviewed in (Vovk et al., 2022, Sect. 13.5.1).

Testing the assumption of exchangeability is a traditional topic in conformal prediction (see, e.g., Vovk et al., 2022, Part III). It is done in the online mode and is based on conformal test martingales. This area is often referred to as conformal testing.

The classical approach to testing exchangeability, which developed in statistics starting from at least 1943 (Wald & Wolfowitz, 1943), proceeds in the batch mode: we are given the data sequence as one batch rather than getting its elements sequentially one by one; see (Lehmann, 2006, Sect. 7.2) for a review. As always in classical hypothesis testing, testing exchangeability in the batch mode is based on p-values.

In this paper we will adapt standard methods of conformal testing to testing exchangeability in the batch mode. In particular, p-values will be replaced by e-values (Grünwald et al., 2023; Vovk & Wang, 2021), which are widely used in conformal testing: namely, conformal test martingales are obtained by compounding e-values. An important advantage of e-values is that their use facilitates efficient computations.

The null hypothesis of exchangeability will be defined in Sect. 2 using the terminology of compression modelling, widely used in conformal prediction (Vovk et al., 2022, Chap. 11). Compression modelling is an algorithm-free version of Kolmogorov’s way of stochastic modelling: cf. Vovk (2001), Vovk and Shafer (2003), V’yugin (2019, Sect. 2), and Vovk et al. (2022, Sect. 11.6.1). Kolmogorov’s original version will be discussed in Appendix 1.

In Sect. 2 we also define e-variables, which are functions for producing e-values in testing exchangeability (or another null hypothesis). We will derive our main e-variable as likelihood ratio for a Markovian alternative hypothesis, which we will introduce in Sect. 4. A simple optimality property of the likelihood ratios is derived in Sect. 3.

After defining our main alternative hypothesis in Sect. 4, we derive an efficient algorithm for computing the corresponding e-variable. The power of this e-variable is the topic of Sect. 5. The algorithm’s performance in view of the results of Sect. 5 is studied in Sect. 6 using simulated data. Section 7 concludes.

Appendix 1 describes Kolmogorov’s original ideal picture of algorithmic randomness. In the following Appendix 2 we will discuss possible ways of making this picture more practical.

2 Testing exchangeability

We consider the simplest binary case, and our observation space is \(\textbf{Z}:=\{0,1\}\). Fix an integer \(N>1\), which we will refer to as the time horizon. We are interested in binary data sequences \((z_1,\dots ,z_N)\in \Omega :=\textbf{Z}^N\). A Kolmogorov compression model (KCM) is a summarising statistic \(t:\Omega \rightarrow \Sigma\), where \(\Sigma\) is a finite set (the summary space), together with the implicit statement that given the summary \(t(z_1,\dots ,z_N)\) (for which we do not make any stochastic assumptions) the actual data sequence \((z_1,\dots ,z_N)\) is generated from the uniform probability measure on the set \(t^{-1}(t(z_1,\dots ,z_N))\) of all data sequences compatible with the summary. Our null hypothesis is the KCM, which we call the exchangeability compression model (ECM), \(t_E(z_1,\dots ,z_N):=z_1+\dots +z_N\). (In the current binary case this is equivalent to the more standard definition used in (Vovk et al., 2022, Sect. 11.3.1), where stands , for a multiset. )

KCM and ECM are two of the three main classes of models used in this paper. The third, largest, class will be introduced later in this section and called BCM. Therefore, the inclusions between the classes will be

$$\begin{aligned} \text{ECM} \subseteq \text{KCM} \subseteq \text{BCM}. \end{aligned}$$
(1)

Let us say that a probability measure P on \(\Omega\) agrees with a summarising statistic t if the data sequences with the same summary have the same P-probability. A probability measure P on \(\Omega\) is exchangeable if \(P(\{(z_1,\dots ,z_N)\})\) depends on \(z_1,\dots ,z_N\) only via \(z_1+\dots +z_N\) (equivalently, via ).

Lemma 1

The exchangeable probability measures on \(\Omega\) are exactly the probability measures that agree with the ECM (the mixtures of the uniform probability measures on \(t_E^{-1}(k)\), \(k\in \{0,\dots ,N\}\)).

The easy proof of Lemma 1 is omitted. It shows that, in terms of standard statistical modelling, we can define our null hypothesis as the set of all exchangeable probability measures on \(\Omega\).

An e-variable w.r. to a probability measure is a nonnegative function on \(\Omega\) with expectation at most 1. An exchangeability e-variable is a function \(E:\Omega \rightarrow [0,\infty )\) whose average over each \(t_E^{-1}(k)\) is at most 1. Such a function E can be used for testing the assumption of exchangeability: if E is chosen in advance, observing a very large \(E(\omega )\) for the realized outcome \(\omega \in \Omega\) casts doubt on the exchangeability assumption.

Alternatively (and equivalently), an exchangeability e-variable may be defined as an e-variable w.r. to every exchangeable probability measure.

Proposition 2

The two meanings of an exchangeability e-variable coincide.

Proof

If the average of E over each \(t_E^{-1}(k)\) is at most 1, it will be an e-variable w.r. to each exchangeable probability measure by Lemma 1.

Now suppose E is an e-variable w.r. to each exchangeable probability measure. Since the uniform probability measure on \(t_E^{-1}(k)\) is exchangeable, the average of E over \(t_E^{-1}(k)\) will be at most 1. \(\square\)

All null hypotheses discussed in this paper will be KCMs. In the main part of the paper we will concentrate on the ECM, but in this and next sections we will also give more general definitions. An e-variable w.r. to a KCM t is a function \(E:\Omega \rightarrow [0,\infty )\) such that the arithmetic mean of E over \(t^{-1}(\sigma )\) is at most 1 for every \(\sigma \in t(\Omega )\). E-values are values taken by e-variables.

2.1 Disintegration of the alternative hypothesis

Let us fix a simple alternative hypothesis Q, which is a probability measure on \(\Omega\). Our statistical procedures will depend on Q only via the corresponding batch compression model (BCM). A BCM is a pair (tP) such that \(t:\Omega \rightarrow \Sigma\) is a summarising statistic and \(P:\Sigma \hookrightarrow \Omega\) [to use the notation of (Vovk et al., 2022, Sect. A.4)] is a Markov kernel such that \(P(\sigma )\) is concentrated on \(t^{-1}(\sigma )\) for each \(\sigma \in \Sigma\). As before, we refer to \(t(\omega )\) as the summary of \(\omega\). Kolmogorov compression models are a special case in which each \(P(\sigma )\) is the uniform probability measure on \(t^{-1}(\sigma )\).

Remark 1

Batch compression models are standard and are often used without giving them any name, as in Lauritzen (1988). They are the batch counterpart of online compression models used in conformal prediction (Vovk et al., 2022, Chap. 11). The three classes shown in (1) are used in different contexts in this paper: general BCMs serve as alternative hypotheses, the null hypothesis of interest in the main part of the paper is the ECM, and in the appendix we will discuss more general KCMs as null hypotheses.

With an alternative hypothesis Q and a summarising statistic \(t:\Omega \rightarrow \Sigma\) (serving as null hypothesis) we associate the alternative Markov kernel \(\sigma \in \Sigma \mapsto Q_{\sigma }\) defined by

$$\begin{aligned} Q_{\sigma }(\{\omega \}):= \frac{Q(\{\omega \})}{Q(t^{-1}(\sigma ))}, \quad \sigma \in \Sigma , \hspace{5.0pt}\omega \in t^{-1}(\sigma ). \end{aligned}$$
(2)

(We are mainly interested in alternative hypotheses Q for which the denominator of (2) is always positive, but in general we could set, e.g., \(0/0:=1/2\) in our binary context.) As compared with Q, the alternative Markov kernel loses the information about \(Q(t^{-1}(\sigma ))\) for \(\sigma \in \Sigma\). (And of course, the reader should keep in mind that alternative Markov kernels and Markov alternative hypotheses are completely different objects, despite both being named after Andrei Andreevich Markov Sr.)

3 Frequentist performance of e-variables

Suppose Q (the alternative probability measure) is the true data-generating distribution, and we keep generating data sequences \((z_1,\dots ,z_N)\in \Omega\) from Q in the IID fashion. The following lemma allows us to define the efficiency of an e-variable via its frequentist performance when we keep applying it repeatedly to accumulate capital. This is a special case of Kelly’s criterion (Kelly, 1956).

Lemma 3

Consider an e-variable E w.r. to a Kolmogorov compression model \(t:\Omega \rightarrow \Sigma\). For any alternative probability measure Q on \(\Omega\), the limitFootnote 1

$$\begin{aligned} {{\,\textrm{ep}\,}}_Q(E):= \lim _{I\rightarrow \infty } \frac{1}{I} \ln \prod _{i=1}^I E(z_1^i,\dots ,z_N^i) \end{aligned}$$
(3)

where \((z_1^i,\dots ,z_N^i)\) is the ith data sequence generated from Q independently, exists \(Q^{\infty }\)-almost surely. Moreover, for all E and Q,

$$\begin{aligned} {{\,\textrm{ep}\,}}_Q(E) = \int \ln E \,\textrm{d}Q. \end{aligned}$$
(4)

The interpretation of (3) is that our capital \(\prod _{i=1}^I E(z_1^i,\dots ,z_N^i)\) grows exponentially fast when betting repeatedly using E (we will see later, in Lemma 4, that we can indeed expect it to grow rather than shrink if we can guess a good Q), and its rate of growth is given by the expression (4), which we will refer to as the e-power of E under the alternative Q.

Proof

It suffices to rewrite (3) as

$$\begin{aligned} {{\,\textrm{ep}\,}}_Q(E) = \lim _{I\rightarrow \infty } \frac{1}{I} \sum _{i=1}^I \ln E(z_1^i,\dots ,z_N^i) \end{aligned}$$

and apply Kolmogorov’s law of large numbers to the IID random variables \(\ln E(z_1^i,\dots ,z_N^i)\) with expectation \(\int \ln E \,\textrm{d}Q\) (which exists and is finite since the sample space is assumed to be finite). \(\square\)

To justify the expression (4) using frequentist considerations, we do not really need the IID picture, as emphasized by Neyman (Neyman, 1977, Sect. 10). When generating \(z_1^i,\dots ,z_N^i\) for different i, we may test different Kolmogorov compression models \(t=t_i\), perhaps with different time horizons \(N=N_i\), against different alternatives \(Q=Q_i\) and using different \(E_i\). The corresponding generalization of Lemma 3 states that the long-term rate of growth of our capital will be asymptotically close to the arithmetic average of \(\int \ln E_i \,\textrm{d}Q_i\). It will involve certain regularity conditions needed for the applicability of the martingale strong law of large numbers [e.g., in the form of (Shafer & Vovk 2019, Chap. 4), which allows non-stochastic choice of \(N_i\), \(t_i\), \(Q_i\), and \(E_i\)]. If the alternative hypothesis does not hold in all trials, Lemma 3 is still applicable to the trials where it does hold.

Now it is easy to find the optimal, in the sense of \({{\,\textrm{ep}\,}}_Q\), e-variable; it will be the ratio of the alternative Markov kernel to the null hypothesis.

Lemma 4

The maximum of \({{\,\textrm{ep}\,}}_Q\) is attained at

$$\begin{aligned} E(\omega ):= \left| t^{-1}(t(\omega )) \right| Q_{t(\omega )}(\{\omega \}), \quad \omega \in \Omega . \end{aligned}$$
(5)

In this case,

$$\begin{aligned} {{\,\textrm{mep}\,}}(Q):= {{\,\textrm{ep}\,}}_Q(E) = \int \ln \left| t^{-1}(\sigma ) \right| (t_*Q)(\textrm{d}\sigma ) + H(t_*Q) - H(Q), \end{aligned}$$
(6)

where \(t_*Q\) (a probability measure on the summary space \(\Sigma\)) is the push-forward measure

$$\begin{aligned} (t_*Q)(\{\sigma \}):= Q(t^{-1}(\sigma )) \end{aligned}$$

of Q by t (the summarising statistic of the null hypothesis), and \(H(\cdot )\) stands for the entropy.

We will call \({{\,\textrm{mep}\,}}(Q)\) defined by (6) the maximum e-power of the alternative Q. A sizeable \({{\,\textrm{mep}\,}}(Q)\) for a plausible alternative Q means that the testing problem is not hopeless and has some potential.

The guarantee given by Lemma 3, however, is frequentist and not applicable if testing is done only once, in which case we also want the optimal e-variable (5) not to be too volatile.

Proof

In this paper we let \(U_A\) stand for the uniform probability measure on a finite non-empty set A. The optimization \(\int E \,\textrm{d}Q\rightarrow \max\) can be performed inside each block \(t^{-1}(\sigma )\) separately. Using the nonnegativity of the Kullback–Leibler divergence, we have, for each \(\sigma \in t(\Omega )\),

$$\begin{aligned} {{\,\textrm{ep}\,}}_{Q_{\sigma }} \left( \frac{Q_{\sigma }}{U_{t^{-1}(\sigma )}} \right) \ge {{\,\textrm{ep}\,}}_{Q_{\sigma }}(E') \end{aligned}$$

for each e-variable \(E'\) w.r. to t, which implies the first statement (about (5)) of the lemma. The second statement (6) follows from

$$\begin{aligned} {{\,\textrm{ep}\,}}_Q(E)&= \int {{\,\mathrm{\textrm{KL}}\,}}(Q_{\sigma } \mathbin {\Vert } U_{t^{-1}(\sigma )}) (t_*Q)(\textrm{d}\sigma )\\&= \int \left( \ln \left| t^{-1}(\sigma )\right| - H(Q_{\sigma }) \right) (t_*Q)(\textrm{d}\sigma )\\&= \int \ln \left| t^{-1}(\sigma ) \right| (t_*Q)(\textrm{d}\sigma ) + H(t_*Q) - H(Q), \end{aligned}$$

where \({{\,\mathrm{\textrm{KL}}\,}}\) stands for the Kullback–Leibler divergence. \(\square\)

4 An explicit algorithm for Markov alternatives

Starting from this section we will consider a specific alternative hypothesis obtained by mixing Markov probability measures. The corresponding exchangeability e-variable will be computable in linear time, O(N).

First let us fix some terminology. The exchangeability summary, or exchangeability type, of a data sequence \(z_1,\dots ,z_N\) is the numbers \((N_0,N_1)\) of 0 s and 1 s in it. (It carries the same information as just the number of 1 s, but we prefer a symmetric definition despite some redundancy.) By a “substring” we always mean a contiguous substring. The Markov type of \(z_1,\dots ,z_N\) is the sextuple \((F,N_{00},N_{01},N_{10},N_{11},L)\), where \(N_{i,j}\) is the number of times (ij) occurs as substring in the sequence \(z_1,\dots ,z_N\) (with the comma often omitted), and F and L are the first and last bits of the sequence.

As our alternative hypothesis, we will take the uniform mixture of the Markov probability measures, defined as follows: \(\pi _{01}\) and \(\pi _{10}\) are generated independently from the uniform distribution \(U_{[0,1]}\) on [0, 1]; the first bit is chosen as 1 with probability 1/2, and after that each 0 is followed by 1 with probability \(\pi _{01}\), and each 1 is followed by 0 with probability \(\pi _{10}\). Let us compute the probability of a sequence of a Markov type \((F,N_{00},\dots ,N_{11},L)\) under this probability measure:

$$\begin{aligned} \begin{aligned} \frac{1}{2} \int&(1-\pi _{01})^{N_{00}} \pi _{01}^{N_{01}} \pi _{10}^{N_{10}} (1-\pi _{10})^{N_{11}} \,\textrm{d}\pi _{01} \textrm{d}\pi _{10}\\&= \frac{1}{2} \textrm{B}(N_{00}+1,N_{01}+1) \textrm{B}(N_{10}+1,N_{11}+1)\\&= \frac{1}{2} \frac{ \Gamma (N_{00}+1) \Gamma (N_{01}+1) \Gamma (N_{10}+1) \Gamma (N_{11}+1) }{ \Gamma (N_{0*}+2) \Gamma (N_{1*}+2) }\\&= \frac{1}{2} \frac{ N_{00}! N_{01}! N_{10}! N_{11}! }{ (N_{0*}+1)! (N_{1*}+1)! }, \end{aligned} \end{aligned}$$
(7)

where \(N_{i*}:=N_{i,0}+N_{i,1}\). If \(N_{1-F}=0\), this probability is \(\frac{1}{2N}\) (which in fact agrees with the general expression (7)). We will refer to (7) as the UMM probability measure, or UMM alternative, where “UMM” stands for “uniformly mixed Markov”.

The uniform prior in (7) is used for mathematical convenience and computational efficiency, and it is discussed in greater detail at the end of Appendix 2.

For future use, set \(\pi _{00}:=1-\pi _{01}\) and \(\pi _{11}:=1-\pi _{10}\).

Following (Vovk et al., 2022, Chap. 9), which in turn follows (Ramdas et al., 2022), let us define the lower benchmark

$$\begin{aligned} \textrm{LB}:= \frac{1}{2} \frac{ N_{00}! N_{01}! N_{10}! N_{11}! }{ (N_{0*}+1)! (N_{1*}+1)! (N_0/N)^{N_0} (N_1/N)^{N_1} } \end{aligned}$$
(8)

as the ratio of the UMM alternative (7) to the maximum likelihood under the IID model (which consists of the IID probability measures \(B^N\), B being a probability measure on \(\{0,1\}\)). The idea behind the lower benchmark is that, for any IID probability measure \(B^N\), it is an e-variable w.r. to \(B^N\), i.e., satisfies \(\int \textrm{LB}\,\textrm{d}B^N\le 1\).

However, the IID model is not our null hypothesis, and our null hypothesis of exchangeability is slightly more challenging. Replacing in (8) the maximum likelihood over the IID model by the maximum likelihood over the exchangeable probability measures, we obtain the exchangeability lower benchmark

$$\begin{aligned} \textrm{ELB}:= \frac{1}{2} \left( {\begin{array}{c}N\\ N_1\end{array}}\right) \frac{ N_{00}! N_{01}! N_{10}! N_{11}!}{(N_{0*}+1)!(N_{1*}+1)!}. \end{aligned}$$
(9)

The exchangeability lower benchmark (9) is a bona fide exchangeability e-variable.

However, our main object of interest in this paper is the more efficient (in the sense of its e-power) e-variable given by Lemma 4 with t being the exchangeability model and Q being the UMM alternative (7). We will refer to this optimal e-variable as the uniformly mixed Markov (UMM) e-variable. A more explicit expression for it and a way of computing it are given below as (14) and Algorithm 1, respectively.

Remark 2

In the spirit of (Koning 2024, Theorem 2) the value of the UMM e-variable on a data sequence \(z_1,\dots ,z_N\) can be written as

$$\begin{aligned} \frac{Q(\{(z_1,\dots ,z_N)\})}{\frac{1}{N!}\sum_{\sigma}Q(\{(z_{\sigma (1)},\dots ,z_{\sigma (N)})\})}, \end{aligned}$$
(10)

where Q is given by (7) and \(\sigma\) ranges over the permutations of \(\{1,\dots ,N\}\). Indeed, the denominator of (10) equals the average of \(Q(\{\omega \})\) over \(\omega \in t^{-1}(t(z_1,\dots ,z_N))\), and so the whole expression (10) equals (5) for \(\omega =(z_1,\dots ,z_N)\) (and t the exchangeability model).

In fact the UMM e-variable dominates the exchangeability lower benchmark. Indeed, the exchangeability lower benchmark replaces the right-hand side of (5) by \(\left| t^{-1}(t(\omega ))\right| Q(\{\omega \})\), and so ignores the denominator in (2). Namely, we have

$$\begin{aligned} \textrm{UMM}(\omega ) = \frac{\textrm{ELB}(\omega )}{Q(t^{-1}(t(\omega )))}. \end{aligned}$$

For the e-power of the exchangeability lower benchmark we have the formula (6) with the second term \(H(t_*Q)\) omitted. Indeed, according to the proof of Lemma 4, that term corresponds to the denominator in (2), which the lower benchmark ignores.

The UMM e-variable and the lower benchmark are not comparable. On the one hand, the lower benchmark is not an exchangeability e-variable in general; it is only an e-variable w.r. to the narrower IID model. This tends to make the lower benchmark larger. On the other hand, the lower benchmark is not admissible under any IID probability measure \(B^N\), in the sense of \(\int \textrm{LB}\,\textrm{d}B^N < 1\), while the UMM e-variable is admissible under any exchangeable probability measure Q, meaning \(\int \textrm{UMM}\,\textrm{d}Q = 1\). This tends to make the UMM e-variable larger.

Remark 3

Notice that the difference between the assumptions of IID and exchangeability, while non-existent in the case of infinite data sequences (by de Finetti’s theorem, every exchangeable probability measure on \(\{0,1\}^{\infty }\) is a mixture of IID probability measures), becomes important for finite data sequences. The difference is quantified in Vovk (1986).

In the rest of this section we will see how to compute efficiently the UMM e-variable, i.e., the likelihood ratio of the UMM alternative Markov kernel (2) to the null Markov kernel. In our derivation we will use the terminology of (Vovk 2005, Section 8.6) (such as “Markov graph”) and consider an arbitrary finite observation space \(\textbf{Z}\) (instead of \(\{0,1\}\), as in the rest of this paper); to avoid trivialities, let us assume \(|\textbf{Z}|>1\). We will also use the following facts (Vovk et al., 2005, Lemmas 8.5 and 8.6), which are versions of standard results in graph theory (the BEST theorem and the Matrix-Tree theorem).

Lemma 5

In any Markov graph \(\sigma\) with the set of vertices V the number of Eulerian paths from the source to the sink equals

$$\begin{aligned} T(\sigma ) \frac{{{\,\textrm{out}\,}}(\text {sink})\prod _{v\in V}({{\,\textrm{out}\,}}(v)-1)!}{\prod _{u,v\in V}N_{u,v}!}, \end{aligned}$$
(11)

where \(T(\sigma )\) is the number of spanning out-trees in the underlying digraph rooted at the source, \(N_{u,v}\) is the number of darts leading from u to v, and \({{\,\textrm{out}\,}}(\cdot )\) is the number of darts leaving a given vertex.

Proof

According to Theorem VI.28 in Tutte (1984) [and using the terminology of (Tutte 1984, Chap. VI)], the number of Eulerian tours in the underlying digraph is

$$\begin{aligned} T(\sigma ) \prod _{v\in V}({{\,\textrm{out}\,}}(v)-1)!. \end{aligned}$$

If the source and sink coincide, the number of Eulerian paths is obtained by multiplying this expression by \({{\,\textrm{out}\,}}(\text {source})\). Finally, we erase the identities of different darts going from u to v for each pair of vertices (uv) by dividing by \(N_{u,v}!\); the resulting expression agrees with (11).

Now suppose the source and sink are different vertices. Create a new digraph by adding another dart leading from the sink to the source. The number of Eulerian paths from the source to the sink in the old digraph will be equal to the number of Eulerian tours in the new graph, i.e.,

$$\begin{aligned} T(\sigma ) {{\,\textrm{out}\,}}(\text {sink}) \prod _{v\in V}({{\,\textrm{out}\,}}(v)-1)!, \end{aligned}$$

where \({{\,\textrm{out}\,}}\) refers to the old digraph. It remains to erase the identities of different darts going from u to v for each pair of vertices (uv) in the old digraph; the resulting expression again agrees with (11). Alternatively, we can combine the two cases by always adding another dart leading from the sink to the source. \(\square\)

Lemma 6

To find the number \(T(\sigma )\) of spanning out-trees rooted at the source in the underlying digraph of a Markov graph \(\sigma\) with vertices \(z_1,\dots ,z_n\) (\(z_1\) being the source),

  • create the \(n\times n\) matrix with the elements \(a_{i,j}=-N_{z_i,z_j}\);

  • change the diagonal elements so that each column sums to 0;

  • compute the co-factor of \(a_{1,1}\).

Proof

This lemma can be derived from Theorem VI.28 in Tutte (1984). In that theorem we obtain \(T(\sigma )\) by computing the co-factor of any diagonal element \(a_{i,i}\), but that theorem is about Eulerian digraphs. We can make the underlying digraph of our Markov graph Eulerian by connecting the sink to the source. This operation does not affect the number of out-trees rooted at the source and does not change the co-factor of \(a_{1,1}\). \(\square\)

Let us specialize Lemmas 5 and 6 to the binary case \(\textbf{Z}:=\{0,1\}\).

Corollary 7

Let \(\sigma\) be a Markov graph with vertices in \(\{0,1\}\) and with \(F\in \{0,1\}\) as its source. The number of Eulerian paths from the source to the sink equals

$$\begin{aligned} N(\sigma ):= {\left\{ \begin{array}{ll} N_{F,1-F} \frac{(N_0-1)!(N_1-1)!}{N_{00}!N_{01}!N_{10}!N_{11}!} & \hbox { if}\ N_0\wedge N_1>0\\ 1 & \text {otherwise}, \end{array}\right. } \end{aligned}$$
(12)

where \(N_i:={{\,\textrm{in}\,}}(i)+1_{\{F=i\}}\) (\({{\,\textrm{in}\,}}(i)\) being the number of darts entering i, so that \(N_i\) is the number of i on any Eulerian path) and \(N_{i,j}\) (with the comma often omitted) is the number of darts leading from i to j.

Proof

The case \(N_0\wedge N_1=0\) is obvious, so we will assume \(N_0\wedge N_1>0\). The number of spanning out-trees rooted at the source in the underlying digraph is

$$\begin{aligned} T(\sigma ) = N_{F,1-F}; \end{aligned}$$

this follows from Lemma 6 and is obvious anyway. It remains to plug this in into Lemma 5: if the source F and sink L coincide, \(F=L\), we obtain

$$\begin{aligned} N_{F,1-F} \frac{(N_F-1)(N_F-2)!(N_{1-F}-1)!}{N_{00}!N_{01}!N_{10}!N_{11}!} \end{aligned}$$

for the number of Eulerian paths from the source to the sink, and if \(F\ne L\), we obtain

$$\begin{aligned} N_{F,1-F} \frac{(N_L-1)(N_F-1)!(N_L-2)!}{N_{00}!N_{01}!N_{10}!N_{11}!}; \end{aligned}$$

both expression agree with (12). \(\square\)

Combining (7) and (12), we obtain the total alternative weight (i.e., probability under the alternative hypothesis) of

$$\begin{aligned} W(\sigma ):= {\left\{ \begin{array}{ll} \frac{1}{2} N_{F,1-F} \frac{(N_0-1)!(N_1-1)!}{(N_{0*}+1)!(N_{1*}+1)!} & \hbox { if}\ N_{1-F}>0\\ \frac{1}{2N} & \text {otherwise} \end{array}\right. } \end{aligned}$$
(13)

for all data sequences of a given Markov type \(\sigma\).

Under the null hypothesis the probability of a data sequence of exchangeability type \((N_0,N_1)\) is

$$\begin{aligned} 1 / \left( {\begin{array}{c}N\\ N_1\end{array}}\right) , \end{aligned}$$

and so the likelihood ratio (the alternative over the ECM as the null hypothesis) is

$$\begin{aligned} \frac{1}{2} \frac{ N_{00}! N_{01}! N_{10}! N_{11}! \left( {\begin{array}{c}N\\ N_1\end{array}}\right) }{ (N_{0*}+1)! (N_{1*}+1)! \sum _{\sigma }W(\sigma ) } = \frac{ N_{00}! N_{01}! N_{10}! N_{11}! \left( {\begin{array}{c}N\\ N_1\end{array}}\right) }{ (N_{0*}+1)! (N_{1*}+1)! \sum _{\sigma } n_{f,1-f} \frac{(N_0-1)!(N_1-1)!}{(n_{0*}+1)!(n_{1*}+1)!} } \end{aligned}$$
(14)

(see (7) and (13)), where the \(\sigma\) in \(\sum _{\sigma }\) ranges over the Markov types \((f,n_{00},\dots ,n_{11},l)\) compatible with the exchangeability type \((N_0,N_1)\). The equality in (14) holds when \(N_0\wedge N_1>0\); in the case \(N_0\wedge N_1=0\) the likelihood ratio is 1 (and we will treat this case separately in Algorithm 1).

The expression (14) (interpreted as 1 when \(N_0\wedge N_1=0\)) is our main object of interest in this paper; remember that we refer to it as the UMM e-variable.

It remains to explain how to compute the second sum \(\sum _{\sigma }\) in (14) (which is twice as large as \(\sum _{\sigma }W(\sigma )\); in particular, it sums to 2 over all exchangeability types). Assume \(N_0\wedge N_1>0\) and remember that \(N\ge 2\). For \(\sigma =(f,n_{00},\dots ,n_{11},l)\) with \(f=l=0\) (which is only possible when \(N_0\ge 2\)), each such addend in the sum is

$$\begin{aligned} n_{f,1-f} \frac{(N_0-1)!(N_1-1)!}{(n_{0*}+1)!(n_{1*}+1)!} = n_{01} \frac{(N_0-1)!(N_1-1)!}{N_0!(N_1+1)!} = \frac{n_{01}}{N_0 N_1 (N_1+1)}. \end{aligned}$$

A specific Markov type \((f,n_{00},\dots ,n_{11},l)\) is determined (once we know that \(f=l=0\)) by \(n_{01}\), and its other components can be found from the equalities

$$\left\{ {\begin{array}{*{20}l} {n_{{01}} = n_{{10}} } \hfill \\ {N_{0} = n_{{00}} + n_{{01}} + 1} \hfill \\ {N_{1} = n_{{01}} + n_{{11}} .} \hfill \\ \end{array} } \right.$$

The valid values for \(n_{01}\) are between 1 and \((N_0-1)\wedge N_1\), and so the part of the sum \(\sum _{\sigma }\) corresponding to such \(\sigma\) is

$$\begin{aligned} \sum _{n_{01}=1}^{(N_0-1)\wedge N_1} \frac{n_{01}}{N_0 N_1 (N_1+1)} = \frac{((N_0-1)\wedge N_1) ((N_0-1)\wedge N_1 + 1)}{2 N_0 N_1 (N_1+1)}. \end{aligned}$$
(15)

Both sides are well defined since \(N_0\ge 2\).

For \(\sigma\) with \(f=0\) and \(l=1\), the part of the sum \(\sum _{\sigma }\) corresponding to such \(\sigma\) is

$$\begin{aligned} \sum _{n_{01}=1}^{N_0\wedge N_1} \frac{n_{01}}{N_0 (N_0+1) N_1} = \frac{ (N_0\wedge N_1) (N_0\wedge N_1 + 1) }{2 N_0 (N_0+1) N_1}. \end{aligned}$$
(16)

For \(\sigma\) with \(f=1\) and \(l=0\), the part of the sum \(\sum _{\sigma }\) corresponding to such \(\sigma\) is

$$\begin{aligned} \sum _{n_{10}=1}^{N_0\wedge N_1} \frac{n_{10}}{N_0 N_1 (N_1+1)} = \frac{ (N_0\wedge N_1) (N_0\wedge N_1 + 1) }{2 N_0 N_1 (N_1+1)}. \end{aligned}$$
(17)

Finally, for \(\sigma\) with \(f=l=1\), the part of the sum \(\sum _{\sigma }\) corresponding to such \(\sigma\) is

$$\begin{aligned} \sum _{n_{10}=1}^{N_0\wedge (N_1-1)} \frac{n_{10}}{N_0 (N_0+1) N_1} = \frac{ (N_0\wedge (N_1-1)) (N_0\wedge (N_1-1) + 1) }{2 N_0 (N_0+1) N_1}. \end{aligned}$$
(18)

Both sides of (18) are well defined since \(N_1\ge 2\).

We can simplify the sum of (15), (16), (17), and (18) as follows. If \(N_0<N_1\), the sum simplifies to

$$\begin{aligned} \frac{N_0+N_1+1}{N_1(N_1+1)}, \end{aligned}$$

and if \(N_0=N_1\), the sum simplifies to \(2/(N_0+1)\). (There is no need to consider the case \(N_1<N_0\) because of the symmetry between \(N_0\) and \(N_1\).) Therefore, the sum over \(\sigma\) on the right-hand side of (14) is

$$\begin{aligned} 2\sum _{\sigma }W(\sigma ) = {\left\{ \begin{array}{ll} \frac{N_0+N_1+1}{(N_0\vee N_1)(N_0\vee N_1+1)} & \hbox { if}\ N_0\ne N_1\\ \frac{2}{N_0+1} & \text {otherwise}. \end{array}\right. } \end{aligned}$$
(19)
Algorithm 1
figure d

Computing the UMM exchangeability e-variable

The overall algorithm is presented as Algorithm 1. The value of the uniformly mixed Markov e-variable \(\textrm{UMM}\) is computed according to (14), and the value \(\textrm{ELB}\) of the exchangeability lower benchmark in line 5 is just (14) with the sum over the Markov types \(\sigma\) omitted. The variable \(\text {Sum}\) is set in lines 6–9 to \(\sum _{\sigma }W(\sigma )\) and computed according to (19). The output is returned by the return command, and the algorithm stops as soon as the first such command is issued.

The computational complexity of Algorithm 1 is clearly optimal (to within a constant factor) both time-wise and memory-wise. Namely, the algorithm requires O(N) steps and O(1) memory.

5 Maximum e-power of the UMM alternative

In this section we will compute the asymptotic efficiency of the UMM e-variable under the UMM alternative. (In the next section, however, we will see the weakness of our notion of efficiency: it has a long-run frequency interpretation, but the logarithm of the UMM e-variable can be extremely volatile, and so its mathematical expectation can be very different from what we actually expect to observe.)

Proposition 8

Under the UMM alternative Q, the asymptotic e-power of the UMM e-variable \(\textrm{UMM}\) (for time horizon N) satisfies

$$\begin{aligned} \lim _{N\rightarrow \infty } {{\,\textrm{mep}\,}}(Q)/N = \lim _{N\rightarrow \infty } {{\,\textrm{ep}\,}}_Q(\textrm{UMM})/N = \frac{8}{3} \ln 2 + \frac{2}{3} \ln ^2 2 - \frac{7}{36} \pi ^2 - \frac{1}{6} \approx 0.083. \end{aligned}$$

The same expression gives the asymptotic e-power of the exchangeability lower benchmark (and of the lower benchmark).

Proof

Let us compute separately the three components after the “\(=\)” in (6), starting from the last one. When estimating \(-H(Q)\), we need to estimate the frequencies \(N_{00}\), \(N_{01}\), \(N_{10}\), \(N_{11}\) for a Markov chain with transition probabilities \(\pi _{i,j}\). To this end, we define a new Markov chain whose states are the pairs \(z_i z_{i+1}\), \(i=1,\dots ,N-1\), of adjacent states of the old Markov chain with the matrix of transition probabilities

$$\begin{aligned} P:= \begin{pmatrix} \pi _{00} & \pi _{01} & 0 & 0\\ 0 & 0 & \pi _{10} & \pi _{11}\\ \pi _{00} & \pi _{01} & 0 & 0\\ 0 & 0 & \pi _{10} & \pi _{11}\\ \end{pmatrix}; \end{aligned}$$

the rows and columns of this matrix are labelled by the states 00, 01, 10, and 11 of the new Markov chain, in this order. The stationary probabilities for this \(4\times 4\) matrix are

$$\begin{aligned} \left( \frac{\pi _{00}\pi _{10}}{\pi _{01}+\pi _{10}}, \frac{\pi _{01}\pi _{10}}{\pi _{01}+\pi _{10}}, \frac{\pi _{01}\pi _{10}}{\pi _{01}+\pi _{10}}, \frac{\pi _{01}\pi _{11}}{\pi _{01}+\pi _{10}} \right) . \end{aligned}$$

Now, assuming that the observations are generated from a Markov chain with transition probabilities \(\pi _{i,j}\), we obtain (cf. (7))

$$\begin{aligned} \mathbb {E}&\ln \left( \frac{1}{2} \frac{ N_{00}! N_{01}! N_{10}! N_{11}! }{ (N_{0*}+1)! (N_{1*}+1)! } \right) \\&= \mathbb {E}\bigl (N_{00} \ln N_{00} - N_{00} +N_{01} \ln N_{01} - N_{01}\\&\quad +N_{10} \ln N_{10} - N_{10} +N_{11} \ln N_{11} - N_{11}\\&\quad -(N_{00}+N_{01}+1) \ln (N_{00}+N_{01}+1) + (N_{00}+N_{01}+1)\\&\quad -(N_{10}+N_{11}+1) \ln (N_{10}+N_{11}+1) + (N_{10}+N_{11}+1) \bigr )+O(N^{1/2})\\&=\mathbb {E}\biggl (N_{00} \ln \frac{N_{00}}{N_{00}+N_{01}} + N_{01} \ln \frac{N_{01}}{N_{00}+N_{01}}\\&\quad + N_{10} \ln \frac{N_{10}}{N_{10}+N_{11}} + N_{11} \ln \frac{N_{11}}{N_{10}+N_{11}} \biggr )+ O(N^{1/2})\\&= N \frac{\pi _{00}\pi _{10}}{\pi _{01}+\pi _{10}} \ln \pi _{00} + N \frac{\pi _{01}\pi _{10}}{\pi _{01}+\pi _{10}} \ln \pi _{01}\\&\quad + N \frac{\pi _{01}\pi _{10}}{\pi _{01}+\pi _{10}} \ln \pi _{10}+ N \frac{\pi _{01}\pi _{11}}{\pi _{01}+\pi _{10}} \ln \pi _{11} + O(N^{1/2}) \end{aligned}$$

(we are ignoring special cases such as \(N_{00}=0\), which should be considered separately). To find the expectation under the Bayes mixture of the Markov model with the uniform prior on \((\pi _{01},\pi _{10})\), we integrate

$$\begin{aligned} \int _0^1 \int _0^1&\biggl ( \frac{\pi _{00}\pi _{10}}{\pi _{01}+\pi _{10}} \ln \pi _{00} + \frac{\pi _{01}\pi _{10}}{\pi _{01}+\pi _{10}} \ln \pi _{01}\nonumber \\&\quad + \frac{\pi _{01}\pi _{10}}{\pi _{01}+\pi _{10}} \ln \pi _{10} + \frac{\pi _{01}\pi _{11}}{\pi _{01}+\pi _{10}} \ln \pi _{11} \biggr ) \,\textrm{d}\pi _{01}\,\textrm{d}\pi _{10}\nonumber \\&= \frac{2}{3} \ln 2 + \frac{2}{3} \ln ^2 2 - \frac{1}{9} \pi ^2 - \frac{1}{6} \approx -0.481. \end{aligned}$$
(20)

Now let us estimate the first term

$$\begin{aligned} \int \ln \left| t^{-1}(\sigma ) \right| (t_*Q)(\textrm{d}\sigma ) \end{aligned}$$

after the “\(=\)” in (6). Set \(K:=\sigma\) (this is the number of 1 s), and suppose the observations are generated from a Markov chain with given transition probabilities \(\pi _{01}\) and \(\pi _{10}\). We then have

$$\begin{aligned} \mathbb {E}\left( \ln \left( {\begin{array}{c}N\\ K\end{array}}\right) \right)&= \mathbb {E}\left( \ln \frac{N!}{K!(N-K)!} \right) = \mathbb {E}\left( \ln \frac{(N/e)^N}{\left( \frac{K}{e}\right) ^K\left( \frac{N-K}{e}\right) ^{N-K}} \right) + O(N^{1/2})\\&= \mathbb {E}\left( -K\ln \frac{K}{N} - (N-K)\ln \left( 1-\frac{K}{N}\right) \right) + O(N^{1/2})\\&= -N\pi _1\ln \pi _1 - N\pi _0\ln \pi _0 + O(N^{1/2}), \end{aligned}$$

where \(\pi _0\) and \(\pi _1\) are the stationary probabilities

$$\begin{aligned} \pi _0:= \frac{\pi _{10}}{\pi _{01}+\pi _{10}} \text { and } \pi _1:= \frac{\pi _{01}}{\pi _{01}+\pi _{10}} \end{aligned}$$

of the Markov chain. It remains to take the integral

$$\begin{aligned} -\int _0^1 \int _0^1&\left( \pi _0\ln \pi _0 + \pi _1\ln \pi _1 \right) \,\textrm{d}\pi _{01}\,\textrm{d}\pi _{10} = -2 \int _0^1 \int _0^1 \left( \pi _0\ln \pi _0 \right) \,\textrm{d}\pi _{01}\,\textrm{d}\pi _{10}\nonumber \\&= -2 \int _0^1 \int _0^1 \left( \frac{\pi _{10}}{\pi _{01}+\pi _{10}} \ln \frac{\pi _{10}}{\pi _{01}+\pi _{10}} \right) \,\textrm{d}\pi _{01}\,\textrm{d}\pi _{10}\nonumber \\&= 2\ln 2 - \frac{1}{12} \pi ^2 \approx 0.564. \end{aligned}$$
(21)

The final term \(H(t_*Q)\) in (6) can be ignored. Indeed, using the last expression in (7), we can bound the probability \((t_*Q)(\{K\})\), for any \(K\in \{1,\dots ,N-1\}\), by 1 from above and by \(1/(2N^3)\) from below:

$$\begin{aligned} (t_*Q)(\{K\}) \ge \frac{1}{2} \frac{(N-K-1)!0!1!(K-1)!}{(N-K)!(K+1)!} = \frac{1}{2(N-K)K(K+1)} \ge \frac{1}{2N^3} \end{aligned}$$
(22)

(the expression after the first “\(\ge\)” being the probability of the sequence consisting of K 1 s followed by \(N-K\) 0 s). Therefore, \(H(t_*Q)=O(\ln N)\). (As always, the extreme cases \(K\in \{0,N\}\) should be considered separately.)

Combining (20) and (21), we obtain the coefficient

$$\begin{aligned} \frac{8}{3} \ln 2 + \frac{2}{3} \ln ^2 2 - \frac{7}{36} \pi ^2 - \frac{1}{6} \approx 0.083 \end{aligned}$$
(23)

in front of N in the asymptotic expression for \({{\,\textrm{ep}\,}}_Q(\textrm{UMM})\).

The proof shows that the asymptotic e-power is the same for the exchangeability lower benchmark, and a simple calculation using Stirling’s formula (see, e.g., [Vovk et al. 2022, Proposition 9.2]) shows that we also have the same asymptotic e-power for the lower benchmark. \(\square\)

Proposition 8 states that the e-powers of the UMM e-variable and of the exchangeability lower benchmark are close asymptotically, and its proof gives a crude argument that is still sufficient to demonstrate this. The following corollary of the previous section’s results establishes much more precise relations between the UMM e-variable and the exchangeability lower benchmark.

Corollary 9

It is always true that

$$\begin{aligned} 1 \le \frac{\textrm{UMM}}{\textrm{ELB}} \le 2N. \end{aligned}$$
(24)

Moreover,

$$\begin{aligned} \frac{\textrm{UMM}}{\textrm{ELB}} = {\left\{ \begin{array}{ll} \frac{2(N_0\vee N_1)(N_0\vee N_1+1)}{N_0+N_1+1} & \hbox { if}\ N_0\ne N_1\\ N_0+1 & \text {otherwise}. \end{array}\right. } \end{aligned}$$
(25)

Proof

In the case \(N_0\wedge N_1>0\), the relation (25) follows from (19). If \(N_0=0\) or \(N_1=0\), the expression on the right-hand side of (25) becomes 2N, which agrees with the last expression (which simplifies to 1/(2N)) on the right-hand side of the chain (7).

For a fixed sum \(N_0+N_1\), the maximum of the right-hand side of (25) is attained for \(N_0=0\) or \(N_1=0\), and the maximum is 2N. This proves (24). \(\square\)

6 Computational experiments

In this section we will conduct three groups of experiments involving the two lower benchmarks and the UMM exchangeability e-variable. The first group is the main one, and in it the true data distribution is a specific Markov probability measure with the initial probability of 1 equal to 1/2. In this case, we define another benchmark [as in (Vovk et al. 2022, Sect. 9.2.5)],

the upper benchmark, as

$$\begin{aligned} \textrm{UB}:= \frac{1}{2} \frac{ N_{00}! N_{01}! N_{10}! N_{11}! }{ (N_{0*}+1)! (N_{1*}+1)! \pi _0^{N_0} \pi _1^{N_1} } \end{aligned}$$
(26)

(cf. (7)), where \(\pi _0\) and \(\pi _1\) are the stationary probabilities under the true data-generating distribution. We can see that the upper benchmark is an e-variable (likelihood ratio) w.r. to a specific IID probability measure, and so it is not even an IID e-variable. Therefore, we should not be surprised if the upper benchmark exceeds a bona fide exchangeability e-variable; there are two elements of cheating in interpreting the upper benchmark as measure of evidence against the null hypothesis of exchangeability: first, it tests IID rather than exchangeability, and second, it tests only one individual IID measure.

Fig. 1
figure 1

The four e-values and related quantities, as described in text. Left panel: \(N=20\) and \(\pi _{01}=\pi _{10}=0.1\). Right panel: \(N=400\) and \(\pi _{01}=\pi _{10}=0.4\). Only ELB and UMM are bona fide exchangeability e-values. The number of simulations is \(K=10^5\) in both panels

Our results for specific Markov alternatives are given in Fig. 1. This figure contains boxplots for \(K:=10^5\) simulations of four values: the exchangeability lower benchmark \(\textrm{ELB}\) (given by (9)), the lower benchmark \(\textrm{LB}\) (given by (8)), the upper benchmark \(\textrm{UB}\) (given by (26)), and the UMM exchangeability e-variable \(\textrm{UMM}\) (given by Algorithm 1). Only two of these, \(\textrm{ELB}\) and \(\textrm{UMM}\), are bona fide exchangeability e-variables.

The time horizon N and the transition probabilities for the two panels are given in the caption.

In both panel of Fig. 1 we consider symmetric Markov chains, \(\pi _{01}=\pi _{10}\), as alternatives to exchangeability. The observations are generated from those alternative probability measures. In the left panel we consider an “easy” case, \(\pi _{01}=0.1\), in the sense of being easily distinguishable from the case of exchangeability, \(\pi _{01}=0.5\). The case in the right panel, \(\pi _{01}=0.4\), is closer to exchangeability and thus more difficult. To decide which e-values are most interesting in practice I used Jeffrey’s (Jeffreys, 1961, Appendix 2) rule of thumb involving thresholds for e-values between \(10^{1/2}\) and 100. In the easy case, \(N=20\) observations are sufficient for the UMM e-variable to produce typical e-values that are of the same order of magnitude as Jeffreys’s thresholds. In the difficult case, we need more observations for that, and we set \(N:=400\).

UMM performs better than LB in both panels and, of course, better than ELB (we know that UMM dominates ELB). ELB and LB often fail to achieve Jeffreys’s low threshold of \(10^{1/2}\) for substantial evidence against the null hypothesis. It is interesting that \(\textrm{UMM}\) is often even higher than the upper benchmark, as in the right panel of Fig. 1.

Table 1 Numerical values for the decimal logarithms of the two lower benchmarks and the UMM e-variable shown in Fig. 1

Table 1 gives more precise numerical values that can be read off Fig. 1 only very approximately. The bars stand for the empirical averages of the decimal logarithms of ELB, LB, and UMM over the same \(K:=10^5\) simulations as in Fig. 1. The table also gives the difference between the empirical averages of the UMM and ELB and the upper bound for the difference given by (24).

According to Corollary 9, the UMM e-value cannot differ from the exchangeability lower benchmark by much. The upper bound (24) holds and is not excessively loose.

Fig. 2
figure 2

Two exchangeability e-values (ELB and UMM) and two approximations (LB and UB) under the null hypothesis. Left panel: the probability of 1 is 0.5. Right panel: the probability of 1 is 0.1. The number of observations is \(N=20\), and the number of simulations is \(K=10^5\)

Figure 2 describes the second group of experiments and explores the behaviour of ELB, LB, UB, and UMM under the null hypothesis (as suggested by a referee). In the left panel the probability of 1 is 0.5, and all four are valid e-variables; while UB is not valid under exchangeability in general, it is valid under this particular exchangeable probability measure. The number of observations is \(N=20\). The UMM e-variable performs best in this case. The right panel has 0.1 as the probability of 1, which makes UB (still based on \(\pi _0=\pi _1=0.5\)) very invalid. Among the valid e-variables UMM still performs best.

Fig. 3
figure 3

Two exchangeability e-values (ELB and UMM) and two approximations (LB and UB) under the UMM alternative. Left panel: \(N=10^3\). Right panel: \(N=10^5\). The number of simulations is still \(K=10^5\)

The third group of experiments involves generating the binary observations from the UMM alternative (which is not Markov any longer). The explicit formula for this alternative is given in (7), but it is easier to generate \(\pi _{01}\) and \(\pi _{10}\) from the uniform distribution on \([0,1]^2\) and then generate the observations from the Markov chain with these parameters. This interpretation of the UMM alternative shows that our algorithm for testing exchangeability is now in a hostile environment: with a sizeable probability we will get \(\pi _{01}\approx \pi _{10}\), i.e., difficult data sequences that look almost exchangeable.

Figure 3 shows results for this case; in the expression (26) for the upper benchmark, we still set \(\pi _{0}:=\pi _{1}:=0.5\). It is striking how spread out the distributions for the three benchmarks and the UMM e-variable are, demonstrating the hostile nature of the testing environment. They are also skewed, with the mean very different from the median. To obtain UMM e-values that are consistently in Jeffreys’s range, now we need much larger values of N, such as \(10^3\), shown in the left panel of Fig. 3. The lack of validity for the upper benchmark is very obvious in Fig. 3: it takes much larger values, and I did not even bother to include the whole boxplots for it.

Table 2 Some figures for the decimal logarithms of the two lower benchmarks and the UMM e-variable

Table 2, which is analogous to Table 1, gives more precise numbers related to Fig. 3. As before, the bars stand for the empirical averages of the decimal logarithms over \(K=10^5\) replications, and N is the time horizon. Now we also have “as.”, the common theoretical asymptotic value for the UMM e-variable and exchangeability lower benchmark obtained from (23) by dividing by \(\ln 10\) (to convert natural logarithms to decimal ones) and multiplying by the sample size N. As expected, the approximation is least accurate for \(N=10^3\). The table also gives the average differences between the UMM e-variable and exchangeability lower benchmark on the \(\log _{10}\) scale, together with the upper bound given by (24). The upper bound still holds.

7 Conclusion

In this paper the algorithm for computing the UMM e-variable was fully developed only in the binary case. A natural next step would be to extend it to any finite observation space \(\textbf{Z}\). (A big chunk of Sect. 4, following (Vovk et al., 2005, Sect. 8.6), presented the combinatorics for an arbitrary finite observation space \(\textbf{Z}\).) It is interesting what the computational complexity of such an extension of Algorithm 1 will be in general as function of N and \(|\textbf{Z}|\).

The topic of this paper has been testing the exchangeability compression model in the batch mode using Markov alternatives. There are many other interesting null hypotheses among Kolmogorov compression models, and there are many interesting alternatives. For example, in Vovk et al.(2022, Chap. 9) we discussed, alongside Markov alternatives, detecting changepoints. Our discussion there was in the online mode, but for changepoint detection the batch mode is not less important (Vovk et al., 2022, Remark 8.19); e.g., its role has been increasing in bioinformatics (including DNA analysis). Using e-values in changepoint detection is particularly convenient when multiple hypothesis testing is involved (as it often is in batch changepoint detection).