The polynomial set associated with a fixed number of matrix-matrix multiplications

Elias Jarlebring and Gustaf Lorentzon

(August 13, 2025)

Abstract

We consider the problem of computing matrix polynomials $p(X)$ , where $X$ is a large dense matrix, with as few matrix-matrix multiplications as possible. More precisely, let $\Pi_{2^{m}}^{*}$ represent the set of polynomials computable with $m$ matrix-matrix multiplications, but with an arbitrary number of matrix additions and scaling operations. We characterize this set through a tabular parameterization. By deriving equivalence transformations of the tabular representation, we establish new methods that can be used to construct elements of $\Pi_{2^{m}}^{*}$ and determine general properties of the set. The transformations allow us to eliminate variables and prove that the dimension is bounded by $m^{2}$ , which is subsequently proven to be sharp, i.e., $\dim(\Pi_{2^{m}}^{*})=m^{2}$ . Consequently, we have identified a parameterization that, to the best of our knowledge, is the first minimal parameterization. We also conduct a study using computational tools from algebraic geometry to determine the largest degree $d$ such that all polynomials of that degree belong to $\Pi_{2^{m}}^{*}$ , or its closure. In many cases, the computational setup is constructive in the sense that it can also be used to determine a specific evaluation scheme for a given polynomial.

1 Introduction

The application that motivates the research question in this paper, is the computation of matrix functions in the sense of Higham [13], which is a classical problem in numerical linear algebra. We define a matrix function as an extension of a scalar function from $f:\mathbb{C}\rightarrow\mathbb{C}$ to matrices, i.e., $f:\mathbb{C}^{n\times n}\rightarrow\mathbb{C}^{n\times n}$ . Important matrix functions like $e^{X}$ , $\text{sign}(X)$ , and $\sqrt{X}$ are crucial in various contexts such as linear ODEs [18], control theory [5], network analysis [7], and quantum chemistry [22]; see [13, Chapter 2] for more applications. Further applications relevant for our setting appear in problems where the action of a matrix function on a vector $b\in\mathbb{C}^{n}$ , i.e., $f(X)b$ is needed, see [9].

In this paper we study methods for computing matrix functions when the input matrix $X\in\mathbb{C}^{n\times n}$ is very large and dense. In particular, we consider a family of methods that only utilize the following two operation types:

•

Linear combination of two matrices $Z\leftarrow\alpha X+\beta Y$
•

Multiplication of two matrices $Z\leftarrow X\cdot Y$ .

A direct consequence of considering only these basic operation types is that this family of methods computes matrix polynomials. Another direct consequence is that the first operation type can be viewed as free in terms of computational cost, since the computational complexity is $O(n^{2})$ and $O(n^{3})$ respectively for the two considered operation types.

We want to study the polynomials that can be computed with a given cost. Since the cost is essentially given by the number of matrix-matrix multiplications, methods with a given fixed cost corresponding to $m$ multiplications form evaluations of polynomials in the set

(1)

\Pi_{2^{m}}^{*}:=\{p\in\widebar{\Pi}_{2^{m}}\hskip-3.0pt:\hskip-1.0pt\text{$p(X)$ is computable with $m$ matrix-matrix multiplications}\},

where $\Pi_{d}$ is the set of all polynomials of degree $d$ in the usual sense. Here, $\widebar{\Pi}_{d}$ is the closure of $\Pi_{d}$ , that is, the set of polynomials of degree $\leq d$ .

Although our application stems from matrix polynomials, the set $\Pi_{2^{m}}^{*}$ is, in fact, a univariate polynomial set, not a matrix polynomial set; it is a univariate semi-algebraic set, as we concretize in Section 2. The set can be defined more abstractly as follows. Let $\Pi=\mathbb{K}[x]$ be the polynomial ring over a field $\mathbb{K}$ ; typically the polynomial coefficients are scalars with $\mathbb{K}=\mathbb{C}$ or $\mathbb{K}=\mathbb{R}$ , such that $\Pi_{d}\subset\Pi$ is the vector space of polynomials in $\mathbb{K}[x]$ of degree at most $d$ . We make the following assumptions about computations involving elements of $\Pi$ :

•

Linear combination of two polynomials $r\leftarrow\alpha p+\beta q$ is considered computationally free, where $p,q\in\Pi$ are polynomials, and $\alpha,\beta\in\mathbb{K}$ are scalars.
•

Multiplication of two polynomials $r\leftarrow p\cdot q$ , where $p,q\in\Pi$ , incurs unit computational cost, independent of the degrees of $p$ and $q$ .

Now, if we fix a computational budget to $m$ , i.e., fix the number of non-scalar multiplications, the total space of computable polynomials is restricted. Since each multiplication can at most double the degree (i.e., $\deg(fg)\leq\deg(f)+\deg(g)$ ), the maximal degree reachable using $m$ multiplications is bounded by $2^{m}$ . The definition of $\Pi^{*}_{2^{m}}\subset\Pi_{2^{m}}$ in (1) corresponds to all polynomials in $\Pi$ that can be computed using at most $m$ non-scalar multiplications, and an arbitrary number of free linear combination operations. Although the matrix polynomial evaluation application is our main source of interest, there are other applications, for example, when the matrix $X$ is replaced by a (scalar) polynomial; see the discussion and references in Section 6.

We also wish to stress the difference of this setting in comparison to the approximation theory assumptions. A common application of matrix polynomials involves the approximation of a non-polynomial function $f$ . In such applications, one often wants to compute a polynomial approximation $p$ of $f$ using a fixed cost. Hence, we want to find the best approximation in $\Pi_{2^{m}}^{*}$ . This is in complete contrast to the classical problem in (scalar) approximation theory, where one seeks the best approximation of a given fixed degree. Since $\Pi_{2^{m}}^{*}\neq\widebar{\Pi}_{2^{m}}$ in general, the fixed degree and fixed cost approximation problems are different in character. In order to construct methods for the fixed cost approximation problem we need to understand the set $\Pi_{2^{m}}^{*}$ . The objective of this paper is to characterize this set.

Techniques to keep the number of matrix-matrix multiplications low have been studied for decades in the context of matrix functions and matrix polynomials. In Paterson and Stockmeyer’s seminal paper [19] an algorithm is presented to compute $p(X)$ for $p\in\widebar{\Pi}_{d}$ in $m=\mathcal{O}(\sqrt{d}$ ) matrix-matrix multiplications, which was made more precise in [8]. Since they also show that dim $(\Pi_{2^{m}}^{*})=\mathcal{O}(m^{2})$ = $\mathcal{O}(d)$ , the Paterson and Stockmeyer algorithm is optimal in the asymptotic order sense.

The fact that the Paterson and Stockmeyer algorithm is often improvable has served as the motivation for a number of recent works. For example, Sastre [24] showed how to improve the Paterson–Stockmeyer method in general; the result shows how to compute most polynomials of degree $8$ using only three multiplications, and most polynomials of degree $12$ using only four multiplications (in contrast to the Paterson–Stockmeyer algorithm that handles degrees $6$ and $9$ , respectively). The work has been expanded in various ways (e.g., [25]). This research demonstrates how an approximation of a general function, such as the Taylor expansion of the exponential, can be computed using $m=4$ multiplications. Specifically, the approximation known as 15+ corresponds to a polynomial of degree 16 that matches all Taylor coefficients except for the last, i.e., the coefficient for $X^{16}$ . Related concepts for rational functions were examined in [23]. Further refinements of the algorithms, especially regarding the scaling-and-squaring method [18, Section 7], have been investigated for both the matrix exponential [27] and the matrix cosine [29] as well as in [28]. Ideas for efficient polynomial evaluation combined with rational approximations have been used in [2, 3], with particular focus on preservation of Lie algebra properties for the matrix exponential. In general, these types of efficient polynomial methods are (at least in theory) better than the standard implementation of the matrix exponential using a Padé approximant combined with the scaling-and-squaring algorithm [1], although further research is needed to definitively support this claim.

In the terminology of our framework, considerable parts of the results in the literature discussed above, are examples involving polynomials that are elements of $\Pi_{2^{m}}^{*}$ . Despite this research attention, the understanding of this set is far from complete. The basis of our analysis is an evaluation scheme (further explained in Section 2) that gives a parameterization of $\Pi_{2^{m}}^{*}$ with parameters given by a triplet $(A,B,c)$ , where $A$ and $B$ are matrices and $c$ a vector. One triplet $(A,B,c)$ corresponds to one polynomial $p\in\Pi_{2^{m}}^{*}$ . There are several ways to obtain the same polynomial; that is, a different triplet $(\hat{A},\hat{B},\hat{c})$ may lead to the same polynomial $p$ .

The first set of new results are equivalence transformations. In Section 3, we present procedures to transform a triplet $(A,B,c)$ into another triplet $(\hat{A},\hat{B},\hat{c})$ that yields the same polynomial $p\in\Pi_{2^{m}}^{*}$ . Several conclusions can be drawn from the transformations. For example, we prove that the first column of the matrices $A$ and $B$ can be selected as zero without loss of generality. More complex transformations reveal that the $(2,2)$ element of $A$ can also be set to zero. Moreover, we establish a transformation related to the third multiplication, which includes an algebraic condition on the elements of $A$ and $B$ . Further analysis shows that the $(3,3)$ element of $A$ can be coupled in a simple way to the $(3,3)$ element of $B$ without loss of generality.

The starting point of Section 4 are conclusions of the equivalence transformations. Using the transformations we can reduce the number of free parameters that parameterize $\Pi_{2^{m}}^{*}$ and conclude that

\dim(\Pi_{2^{m}}^{*})\leq m^{2}\textrm{ for }m\geq 3.

This bound is sharper than those presented in, for example, [19]. Moreover, we prove that this is optimal and, to our knowledge, our parameterization therefore forms the first minimal parameterization of unreduced evaluation schemes in $\Pi_{2^{m}}^{*}$ .

Further results are presented (in Section 5) regarding the problem of finding included polynomial subsets, specifically determining

(2)

\max\{d:\Pi_{d}\subset\widebar{\Pi_{2^{m}}^{*}}\}.

Using the transformations and tools from computational algebraic geometry, we solve this problem for $m=4$ and provide conjectures supported by strongly indicative computational results in high-precision arithmetic for $m=5$ , $m=6$ , and $m=7$ . Our solutions to (2) are $12$ , $20$ , $32$ , $30$ , and $42$ for $m=4$ , $m=5$ , $m=6$ (complex arithmetic), $m=6$ (real arithmetic), and $m=7$ (complex arithmetic), respectively. The numerical experiments are reproducible and provided in a publicly available repository, including generated code for Matlab and Julia.

While the simulations focus on the determination of the dimension of $\Pi_{2^{m}}^{*}$ and the solution to (2), other aspects of the findings in this paper may be significant beyond these specific research questions. The transformations themselves are constructive and may be used to improve various properties of the evaluation scheme, such as numerical stability. The study of the polynomial subset in Section 5 is also constructive. We provide evaluation schemes for the truncated Taylor expansion of the matrix exponential at specified degrees. For instance, we achieve the degree- $30$ expansion using $m=6$ multiplications, whereas prior works require $7$ multiplications [27]. Moreover, our simulations are applicable to several different functions, and the software we provide can be used to compute the evaluation scheme coefficients for polynomials other than those reported in this paper.

2 Evaluation scheme

2.1 Definition of the evaluation scheme

Fundamental to the results of this paper is an evaluation scheme for constructing elements of $\Pi_{2^{m}}^{*}$ . Such evaluation schemes have been presented in various forms in [19, Section 2], [2, section 3], and [14], where in [14] they are referred to as degree-optimal polynomials. The evaluation scheme involves parameters stored in the matrices $A$ and $B$ , and the vector $c$ . Matrices $A$ and $B$ contain coefficients for linear combinations, where each row $k$ corresponds to the linear combinations associated with the $k$ th multiplication. As we perform $m$ multiplications, these matrices have $m$ rows. The vector $c$ contains coefficients for linear combinations that are used after the multiplications have been completed.

In particular, consider the triplet $(A,B,c)\in\mathbb{C}^{m\times(m+1)}\times\mathbb{C}^{m\times(m+1)}\times\mathbb{C}^{m+2}$ and associate the following sequence of operations involving exactly $m$ matrix-matrix multiplications. Define $Q_{1}=I$ and $Q_{2}=X$ . Then, we iterate the process to generate $Q_{3},\ldots,Q_{m+2}$ using the elements of matrices $A$ and $B$ as follows:

(3)	$\displaystyle Q_{3}$	$\displaystyle=(a_{11}Q_{1}+a_{12}Q_{2})(b_{11}Q_{1}+b_{12}Q_{2})$
	$\displaystyle Q_{4}$	$\displaystyle=(a_{21}Q_{1}+a_{22}Q_{2}+a_{23}Q_{3})(b_{21}Q_{1}+b_{22}Q_{2}+b_{23}Q_{3})$
	$\displaystyle Q_{5}$	$\displaystyle=(a_{31}Q_{1}+a_{32}Q_{2}+a_{33}Q_{3}+a_{34}Q_{4})(b_{31}Q_{1}+b_{32}Q_{2}+b_{33}Q_{3}+b_{34}Q_{4})$
		$\displaystyle\vdots$
	$\displaystyle Q_{m+2}$	$\displaystyle=(a_{m,1}Q_{1}+\cdots+a_{m,m+1}Q_{m+1})(b_{m,1}Q_{1}+\cdots+b_{m,m+1}Q_{m+1}).$

Furthermore, we compute the output of the scheme by forming the linear combination corresponding to the $c$ -vector:

(4)

p(X)=c_{1}Q_{1}+c_{2}Q_{2}+\cdots+c_{m+2}Q_{m+2}.

Thus, a given triple $(A,B,c)$ , with $A$ and $B$ being matrices, defines a polynomial $p\in\Pi_{2^{m}}^{*}$ .

Note that all evaluation schemes can be expressed in this manner, due to the fact that for each multiplication step we use all preceding information available. For instance, in the second multiplication step we determine $Q_{4}$ by computing the product of two linear combinations of $Q_{3}$ , $Q_{2}$ and $Q_{1}$ . Hence, a triplet $(A,B,c)$ parameterizes $\Pi_{2^{m}}^{*}$ through the expressions in (3) and (4).

Examples of instances of $\Pi_{2^{m}}^{*}$

To see how the tables represent standard evaluation schemes, such as monomial evaluation, Horner evaluation, and Paterson–Stockmeyer evaluation, see [14, Section 3]. In the following, we show how to evaluate

p(X)=X^{7}+\epsilon X^{8},

in three multiplications. Consider the coefficient triplet:


(5d)	$\displaystyle[A\|B]$	$\displaystyle=\left[\begin{array}[]{c c c c \| c c c c}0&1&&&0&1&&\\ 0&\frac{1}{2\epsilon}&1&&0&0&1&\\ 0&\frac{7+8\epsilon^{2}-16\epsilon^{4}}{128\epsilon^{5}}&-\frac{1}{8\epsilon^{2}}-\frac{1}{2}&1&0&\frac{-7+8\epsilon^{2}+16\epsilon^{4}}{128\epsilon^{5}}&-\frac{1}{8\epsilon^{2}}+\frac{1}{2}&1\end{array}\right]$
(5f)	$\displaystyle c$	$\displaystyle=\left[\begin{array}[]{ccccc}0&0&\frac{49-288\epsilon^{4}+256\epsilon^{8}}{16384\epsilon^{9}}&\frac{-5+16\epsilon^{4}}{64\epsilon^{3}}&\epsilon\end{array}\right].$

For this triplet $(A,B,c)$ , we obtain the polynomial $p(X)=X^{7}+\epsilon X^{8}$ using the evaluations specified in (3)-(4). We observe that $X^{7}$ is a limit point of $\Pi_{2^{3}}^{*}$ corresponding to $\epsilon\rightarrow 0$ , even though $X^{7}$ requires four multiplications. This example illustrates the complexity of $\Pi_{2^{m}}^{*}$ for $m=3$ . As $m$ increases, the complexity becomes even more intricate.

2.2 The semi-algebraic set perspective of $\Pi_{2^{m}}^{*}$

For our study, we need concepts from algebraic geometry. Let $\mathbb{K}$ be the field of the parameters in the evaluation. Let $\mathbb{X}$ be the product space associated with the parameters. That is, $\mathbb{X}:=\mathbb{K}^{m(m+3)/2}\times\mathbb{K}^{m(m+3)/2}\times\mathbb{K}^{m+2}=\mathbb{K}^{s}$ , where $s=m^{2}+4m+2$ is the number of parameters in the evaluation.

If we denote the evaluation scheme (3)–(4) by the map $\Phi:(A,B,c)\mapsto p\in\widebar{\Pi}_{d}$ , we can describe $\Pi_{2^{m}}^{*}$ as its image

(6)

\Phi(\mathbb{X})=\Pi_{2^{m}}^{*}.

By construction, $\Phi$ is algebraic, since it depends polynomially on the parameters. It follows from the Tarski–Seidenberg theorem [17, Theorem 8.6.6] that $\Pi_{2^{m}}^{*}$ is a semi-algebraic set, since it is the image of an algebraic map over a vector space.

Example (5d)–(5f) illustrates that the set is not necessarily an algebraic variety, i.e., it is only semialgebraic. More precisely, let $\widebar{\cdot}$ denote topological closure in a Zariski sense. Then, in relation to the example (5d)–(5f) we see that $\epsilon=0$ is not a valid choice, and indeed $X^{7}\not\in\Pi_{2^{3}}^{*}$ . However, since $\epsilon=0$ is a limit point, we have $X^{7}\in\widebar{\Pi_{2^{3}}^{*}}$ . Therefore $\Pi_{2^{m}}^{*}$ is not a variety since it does not include all of its limit points.

If $\varphi_{0},\ldots,\varphi_{d}$ denotes a basis of the ambient space $\mathbb{K}[x]_{d}$ , e.g., as in our simulations a monomial basis, and let $J$ be the Jacobian of the map $\Phi$ with an output expressed in this basis. Following the standard definitions in algebraic geometry [30], the dimension of a semi-algebraic set is given by the rank of $J$ at any non-singular point. Most points in the image of a smooth map are non-singular; therefore for almost all $(A,B,c)$ we have

\dim\Pi_{2^{m}}^{*}=\mathop{\operator@font rank}\nolimits J,

where $J\in\mathbb{K}^{(d+1)\times s}$ , $d=2^{m}$ and $s$ is the number of parameters in the method.

The dimension of $\Pi_{2^{m}}^{*}$ is always bounded by the number of free parameters, and in this setting it can be concluded from the size of $J$ that

(7)

\dim\Pi_{2^{m}}^{*}\leq s=m^{2}+4m+2.

In this paper we improve this bound by reducing the number of free parameters and in Section 4 we subsequently conclude that $m^{2}$ is the exact dimension, for $m>3$ .

We note that the properties of the set $\Pi_{2^{m}}^{*}$ differ from an algebraic perspective when considering $\mathbb{K}=\mathbb{C}$ versus $\mathbb{K}=\mathbb{R}$ . Most results in this paper are applicable to both cases; however, when explicitly discussing the Zariski closure, we will assume $\mathbb{K}=\mathbb{C}$ unless stated otherwise. In the simulations presented in Section 5, we concentrate on $\mathbb{K}=\mathbb{C}$ and provide additional observations for $\mathbb{K}=\mathbb{R}$ .

3 Equivalence transformations

3.1 Substitution transformations

The following theorems describe transformations of the evaluation scheme $(A,B,c)$ , as defined in Section 2, such that the output polynomial $p(X)$ is unchanged. The modified evaluation scheme will be denoted $(\hat{A},\hat{B},\hat{c})$ and the corresponding $Q$ -matrices will be denoted $\hat{Q}_{1},\ldots,\hat{Q}_{m+1}$ . The first transformation can be viewed as a change of variables, where we rescale one of the $Q$ -coefficients. For example if we set

\hat{Q}_{4}=\alpha Q_{4}

the tables (3) are changed by

\hat{Q}_{4}=(\alpha a_{21}Q_{1}+\alpha a_{22}Q_{2}+\alpha a_{23}Q_{3})(b_{21}Q_{1}+b_{22}Q_{2}+b_{23}Q_{3}).

In order to keep $Q_{5},\ldots,Q_{m+2}$ unchanged, we can cancel the transformation by scaling the coefficient in the corresponding term, e.g.,

Q_{5}=(a_{31}Q_{1}+a_{32}Q_{2}+a_{33}Q_{3}+\alpha^{-1}a_{34}\hat{Q}_{4})(b_{31}Q_{1}+b_{32}Q_{2}+b_{33}Q_{3}+\alpha^{-1}b_{34}\hat{Q}_{4}).

This implies that without changing the polynomial $p$ , we may rescale one row in $A$ and one column in $A$ and $B$ . This is formalized in the following theorem, where we omit indices on elements of $A$ and $B$ that are unchanged in $\hat{A}$ and $\hat{B}$ . By symmetry, we note that the same result holds for $A$ and $B$ switched.

Theorem 1.

Let $p$ be the polynomial associated with $(A,B,c)\in\mathbb{C}^{m\times(m+1)}\times\mathbb{C}^{m\times(m+1)}\times\mathbb{C}^{m+2}$ and let $\hat{p}$ be the polynomial associated with $(\hat{A},\hat{B},\hat{c})$ , where

(8)	$\displaystyle\hat{A}$	$\displaystyle=\left[\begin{array}[]{c c c c c c c c c c}a&a&\\ \vdots&&\ddots\\ a&\cdots&\cdots&a\\ \hat{a}_{k,1}&\cdots&\cdots&\cdots&\hat{a}_{k,k+1}\\ a&\cdots&\cdots&\cdots&a&\hat{a}_{k+1,k+2}\\ \vdots&&&&\vdots&\vdots&a\\ \vdots&&&&\vdots&\vdots&\vdots&\ddots\\ a&\cdots&\cdots&\cdots&a&\hat{a}_{m,k+2}&a&\cdots&a\\ \end{array}\right]$
	$\displaystyle\hat{B}$	$\displaystyle=\left[\begin{array}[]{c c c c c c c c c c}b&b&\\ \vdots&&\ddots\\ \phantom{x}b\phantom{x}&\cdots&\cdots&b\\ b&\cdots&\cdots&\cdots&\phantom{xx}b\phantom{xx}\\ b&\cdots&\cdots&\cdots&b&\hat{b}_{k+1,k+2}\\ \vdots&&&&\vdots&\vdots&b\\ \vdots&&&&\vdots&\vdots&\vdots&\ddots\\ b&\cdots&\cdots&\cdots&b&\hat{b}_{m,k+2}&b&\cdots&b\\ \end{array}\right]$
	$\displaystyle\hat{c}$	$\displaystyle=\;\left[\begin{array}[]{c c c c c c c c c}\phantom{x}c_{1}&\cdots&\cdots&\cdots&c_{k+1}&\alpha^{-1}c_{k+2}&c_{k+3}&\cdots&c_{m+2}\end{array}\right],$

and


(9a)	$\displaystyle\hat{a}_{k,j}$	$\displaystyle=$	$\displaystyle\alpha a_{k,j},\;\;\;\qquad j=1,\ldots,k+1,$
(9b)	$\displaystyle\hat{a}_{i,k+2}$	$\displaystyle=$	$\displaystyle\alpha^{-1}a_{i,k+2},\;\;\;i=k+1,\ldots,m,$
(9c)	$\displaystyle\hat{b}_{i,k+2}$	$\displaystyle=$	$\displaystyle\alpha^{-1}b_{i,k+2},\;\;\;i=k+1,\ldots,m.$

Then,

(10)

\hat{p}=p,

for any $k\in\left[1,\ldots,m\right]$ and $\alpha\in\mathbb{C}\backslash\{0\}$ .

Proof.

By definition, we have $\hat{Q}_{1}=Q_{1}=I$ and $\hat{Q}_{2}=Q_{2}=X$ . The proof is based on establishing the following three statements:


(11a)	$\displaystyle\hat{Q}_{i+2}$	$\displaystyle=$	$\displaystyle Q_{i+2}\qquad\text{for }i=1,\ldots,k-1,$
(11b)	$\displaystyle\hat{Q}_{k+2}$	$\displaystyle=$	$\displaystyle\alpha Q_{k+2},$
(11c)	$\displaystyle\hat{Q}_{i+2}$	$\displaystyle=$	$\displaystyle Q_{i+2}\qquad\text{for }i=k+1,\ldots,m+2.$

Together with the definition of $\hat{c}$ in (8), this implies that the conclusion of the theorem, stated in (10), holds.

To prove (11a), we observe that since the first $k-1$ rows of $\hat{A}$ and $\hat{B}$ are unchanged, the variables $\hat{Q}_{1},\ldots,\hat{Q}_{k+1}$ are also unchanged.

To prove (11b), we consider row $k$ , i.e., the first row where there are changes. Inserting (9a) in the definition of $\hat{Q}_{k+1}$ yields a scaling,

\displaystyle\hat{Q}_{k+2}

\displaystyle=(\alpha a_{k,1}Q_{1}+\cdots+\alpha

\displaystyle a_{k,k+1}Q_{k+1}

\displaystyle)(b_{k,1}Q_{1}+\cdots+b_{k,k+1}Q_{k+1}

\displaystyle)=\alpha Q_{k+2},

which proves (11b).

To prove (11c), we first consider (11c) for $i=k+1$ . Inserting (9b) in the definition of $\hat{Q}_{k+3}$ results in a cancellation

	$\displaystyle\hat{Q}_{k+3}$	$\displaystyle=(a_{k+1,1}Q_{1}+\ldots+\alpha^{-1}a_{k+1,k+2}\hat{Q}_{k+2}$		$\displaystyle)(b_{k+1,1}Q_{1}+\ldots+\alpha^{-1}b_{k+1,k+2}\hat{Q}_{k+2}$		$\displaystyle)$
		$\displaystyle=Q_{k+3},$

where the last equality follows from (11b). The general statement (11c) follows from induction. ∎

In a similar fashion, we carry out a change of variables where we add a given value $\alpha$ to one of the elements in the first column. For example, adding $\alpha$ to the coefficient corresponding to $Q_{1}=I$ in the first multiplication leads to

\hat{Q}_{3}=\left((a_{11}+\alpha)Q_{1}+a_{12}Q_{2}\right)\left(b_{11}Q_{1}+b_{12}Q_{2}\right),

which can be expanded as

(12)

\hat{Q}_{3}=Q_{3}+\alpha(b_{11}Q_{1}+b_{12}Q_{2}).

Let $A_{k}$ and $B_{k}$ be the factors that form $Q_{k}$ , i.e., $Q_{k}=A_{k}B_{k}$ . In order to keep $Q_{4},\ldots,Q_{m+2}$ unchanged, we keep both factors in $\hat{Q}_{4}=Q_{4}=A_{4}B_{4}$ unchanged by compensating for the transformation (12):

(13)		$\displaystyle A_{4}$	$\displaystyle=$	$\displaystyle(a_{21}-\alpha a_{23}b_{11})Q_{1}+(a_{22}-\alpha a_{23}b_{12})Q_{2}+a_{23}\hat{Q}_{3}$
(14)		$\displaystyle B_{4}$	$\displaystyle=$	$\displaystyle(b_{21}-\alpha b_{23}b_{11})Q_{1}+(b_{22}-\alpha b_{23}b_{12})Q_{2}+b_{23}\hat{Q}_{3}.$

Hence, a modification in the coefficient corresponding to $Q_{1}$ can be compensated for by modifying the coefficients in all rows below the modification. This can be applied to an arbitrary row $k$ and arbitrary $\alpha$ .

Theorem 2.

(15)	$\displaystyle\hat{A}$	$\displaystyle=\left[\begin{array}[]{c c c c c c c c c c}a&a&\\ \vdots&&\ddots\\ a&a&\cdots&a\\ a+\alpha&a&\cdots&\cdots&a\\ \hat{a}_{k+1,1}&\hat{a}_{k+1,2}&\cdots&\cdots&\hat{a}_{k+1,k+1}&a\\ \vdots&\vdots&&&\vdots&\vdots&\ddots\\ \hat{a}_{m,1}&\hat{a}_{m,2}&\cdots&\cdots&\hat{a}_{m,k+1}&a&\cdots&a\\ \end{array}\right]$
	$\displaystyle\hat{B}$	$\displaystyle=\left[\begin{array}[]{c c c c c c c c c c}b&b&\\ \vdots&&\ddots\\ b&\cdots&\cdots&b\\ \hat{b}_{k+1,1}&\cdots&\cdots&\hat{b}_{k+1,k+1}&b\\ \vdots&&&\vdots&\vdots&\ddots\\ \hat{b}_{m,1}&\cdots&\cdots&\hat{b}_{m,k+1}&b&\cdots&b\\ \end{array}\right]$
	$\displaystyle\hat{c}$	$\displaystyle=\;\left[\;\begin{array}[]{c c c c c c c c}\phantom{x}\hat{c}_{1}\phantom{x}&\cdots&\cdots&\phantom{x}\hat{c}_{k+1}\phantom{x}&c&\cdots&\cdots&c\end{array}\right].$

and


(16a)	$\displaystyle\hat{a}_{i,j}$	$\displaystyle=$	$\displaystyle a_{i,j}-\alpha a_{i,k+2}b_{k,j},\;\quad j=1,\ldots,k+1,\quad i=k+1,\ldots,m,$
(16b)	$\displaystyle\hat{b}_{i,j}$	$\displaystyle=$	$\displaystyle b_{i,j}-\alpha b_{i,k+2}b_{k,j},\;\;\quad j=1,\ldots,k+1,\quad i=k+1,\ldots,m,$
(16c)	$\displaystyle\hat{c}_{j}$	$\displaystyle=$	$\displaystyle c_{j}-\alpha c_{k+2}b_{k,j},\;\;\qquad j=1,\ldots,k+1.$

Then,

(17)

\hat{p}=p,

for any $k\in\left[1,\ldots,m\right]$ and $\alpha\in\mathbb{C}$ .

Proof.

Similarly to the proof of Theorem 1, we establish the following three statements:


(18a)	$\displaystyle\hat{Q}_{i+2}$	$\displaystyle=$	$\displaystyle Q_{i+2}\quad\text{for }i=1,\ldots,k-1,$
(18b)	$\displaystyle\hat{Q}_{k+2}$	$\displaystyle=$	$\displaystyle Q_{k+2}+\alpha B_{k+2},$
(18c)	$\displaystyle\hat{Q}_{i}$	$\displaystyle=$	$\displaystyle Q_{i}\quad\text{for }i=k+3,\ldots,m+2.$

Together with the definition of $\hat{c}$ in (16c), this implies that the conclusion of the theorem in equation (17) holds. Relation (18a) follows analogously to the proof in Theorem 1.

To prove (18b), we insert the modified element $\hat{a}_{k,1}$ given in (15) and observe that

	$\displaystyle\hat{Q}_{k+2}$	$\displaystyle=$	$\displaystyle\left((a_{k,1}+\alpha)Q_{1}+a_{k,2}Q_{2}+\cdots+a_{k,k+1}Q_{k+1}\right)B_{k+2}$
		$\displaystyle=$	$\displaystyle A_{k+2}B_{k+2}+\alpha B_{k+2}=Q_{k+2}+\alpha B_{k+2}.$

To prove (18c), we prove that $\hat{A}_{k+3}=A_{k+3},\ldots,\hat{A}_{m+2}=A_{m+2}$ and $\hat{B}_{k+3}=B_{k+3},\ldots,\hat{B}_{m+2}=B_{m+2}$ , i.e., that the factors for $\hat{Q}_{k+3}$ ,…, $\hat{Q}_{m+2}$ are unchanged. For $k+3$ we have

(19)	$\displaystyle\hat{A}_{k+3}$	$\displaystyle=$	$\displaystyle\hat{a}_{k+1,1}Q_{1}+\cdots+\hat{a}_{k+1,k+1}Q_{k+1}+a_{k+1,k+2}\hat{Q}_{k+2}$
		$\displaystyle=$	$\displaystyle A_{k+3}-\alpha a_{k+1,k+2}(b_{k,1}Q_{1}+\cdots+b_{k,k+1}Q_{k+1})+a_{k+1,k+2}\alpha B_{k+2}$
		$\displaystyle=$	$\displaystyle A_{k+3}-\alpha a_{k+1,k+2}B_{k+2}+\alpha a_{k+1,k+2}B_{k+2}=A_{k+3},$

where we have inserted (16a) and (18b) to obtain the second equality. The relation $\hat{B}_{k+3}=B_{k+3}$ can be shown analogously using (16b) and (18b). By induction, the corresponding relation for any factor. Consequently, we have

(20)

\hat{Q}_{i+2}=\hat{A}_{i+2}\hat{B}_{i+2}=A_{i+2}B_{i+2}=Q_{i+2},\quad i=k+1,\ldots,m

which proves (18c).

The conclusion (17) follows from the same construction as in (19). ∎

3.2 Normalized forms and unreduced schemes

The theorems in the previous section can be applied to impose a certain structure on the triplet $(A,B,c)$ without (or with very little) loss of generality. For any triplet $(A,B,c)$ we can invoke Theorem 2 repeatedly. By setting $\alpha$ to the negation of the element in the first column, we obtain matrices $A$ and $B$ with a first column containing only zeros. The first column corresponds to the addition of a scaled identity, which is independent (constant) with respect to the input $X$ . Note that this can be done for any triplet, and no generality (in the sense of parameterized polynomials) is lost by assuming the first column is zero.

Definition 3.

If $a_{1,1}=\cdots=a_{m,1}=b_{1,1}=\cdots=b_{m,1}=0$ , we call the evaluation scheme a constant-free evaluation scheme.

Corollary 4.

Any evaluation scheme is equivalent to a constant-free evaluation scheme.

Similarly, if we assume that the Hessenberg matrices $A$ and $B$ are unreduced [11, p. 381], i.e., the elements $a_{1,2},\ldots,a_{m,m+1}$ and $b_{1,2},\ldots,b_{m,m+1}$ are all nonzero, we can impose further structure by repeatedly applying Theorem 1 with scaling determined by the corresponding last nonzero element of the row. This process imposes a normalization on each row.

Definition 5.

A constant-free evaluation scheme satisfying $a_{1,2}=\cdots=a_{m,m+1}=b_{1,2}=\cdots=b_{m,m+1}=1$ is called a normalized evaluation scheme, and we call the triplet $(A,B,c)$ normalized.

Corollary 6.

Any evaluation scheme corresponding to $(A,B,c)$ where $A$ and $B$ are unreduced Hessenberg matrices is equivalent to a normalized evaluation scheme.

3.3 Further transformations

For normalized constant-free evaluation schemes, we have $a_{11}=b_{11}=0$ and $a_{12}=b_{12}=1$ , meaning that the first multiplication always corresponds to squaring the input matrix, i.e.,

(21)

Q_{3}=X^{2}.

This assumption can be made without loss of generality and will be used henceforth. This fact was already observed by Paterson and Stockmeyer [19, p. 61].

We now derive further transformations under a technical assumption. For the first two rows we assume that we have the structure of an unreduced constant-free triplet.¹¹1This assumption is made mostly to simplify the derivation, and similar results hold, e.g., when $a_{23}=0$ . This in turn is equivalent to assuming $a_{23}=b_{23}=1$ .

Consider the following transformation based on modifying the second row. We add $\alpha$ and $\beta$ to the second element in the second row in $A$ and $B$ , respectively, i.e.,


(22a)	$\displaystyle\hat{Q}_{4}$	$\displaystyle=$	$\displaystyle((a_{22}+\alpha)Q_{2}+Q_{3})((b_{22}+\beta)Q_{2}+Q_{3})$
(22b)		$\displaystyle=$	$\displaystyle Q_{4}+(\alpha\beta+\alpha b_{22}+\beta a_{22})X^{2}+(\alpha+\beta)X^{3}.$

If we enforce $\beta=-\alpha$ , one term in the modification disappears, and we can compensate for the other term in a way such that $\hat{Q}_{5}=Q_{5},\ldots,\hat{Q}_{m}=Q_{m}$ , i.e., they are unmodified. Consequently, the output polynomial is also unmodified.

Theorem 7.

Let $p$ be the polynomial associated with a constant-free triplet $(A,B,c)\in\mathbb{C}^{m\times(m+1)}\times\mathbb{C}^{m\times(m+1)}\times\mathbb{C}^{m+2}$ , satisfying $a_{23}=b_{23}=1$ , and let $\hat{p}$ be the polynomial associated with $(\hat{A},\hat{B},\hat{c})$ where


(23a)	$\displaystyle\hat{A}$	$\displaystyle=$	$\displaystyle\begin{bmatrix}0&1&&&&&\\ 0&a+\alpha&a&&&&\\ 0&a&\hat{a}_{3,3}&a&&&\\ 0&a&\hat{a}_{4,3}&a&\ddots&&\\ \vdots&\vdots&\vdots&\vdots&\ddots&a&\\ 0&a&\hat{a}_{m,3}&a&\cdots&a&a&\end{bmatrix}$
(23b)	$\displaystyle\hat{B}$	$\displaystyle=$	$\displaystyle\begin{bmatrix}0&1&&&&&\\ 0&b-\alpha&b&&&&\\ 0&b&\hat{b}_{3,3}&b&&&\\ 0&b&\hat{b}_{4,3}&b&\ddots&&\\ \vdots&\vdots&\vdots&\vdots&\ddots&b&\\ 0&b&\hat{b}_{m,3}&b&\cdots&b&b&\end{bmatrix}$

and

(24)

\hat{c}=\left[\begin{array}[]{c c c c c c}c_{1}&c_{2}&\hat{c}_{3}&c_{4}&\cdots&c_{m+2}\end{array}\right].

The modifications to the coefficients are given by


(25a)	$\displaystyle\hat{a}_{i,3}$	$\displaystyle=$	$\displaystyle a_{i,3}-sa_{i,4},\quad i=3,\ldots,m.$
(25b)	$\displaystyle\hat{b}_{i,3}$	$\displaystyle=$	$\displaystyle b_{i,3}-sb_{i,4}\quad i=3,\ldots,m.$
(25c)	$\displaystyle\hat{c}_{3}$	$\displaystyle=$	$\displaystyle c_{3}-sc_{4},$

where

s=-\alpha^{2}+\alpha(b_{22}-a_{22}).

Then,

(26)

p=\hat{p}.

Proof.

From the first row we have $\hat{Q}_{3}=Q_{3}=X^{2}$ . The proof is based on establishing the following two statements:


(27a)	$\displaystyle\hat{Q}_{4}$	$\displaystyle=$	$\displaystyle Q_{4}+sQ_{3},$
(27b)	$\displaystyle\hat{Q}_{i+2}$	$\displaystyle=$	$\displaystyle Q_{i+2},\quad i=3,\ldots,m.$

This, together with the definition of $\hat{c}$ in (25c) implies that the theorem conclusion (26) holds.

To prove (27a), we use relation (22). More precisely, we substitute $\beta=-\alpha$ into (22b) and obtain

(28)

\begin{split}\hat{Q}_{4}&=Q_{4}+(-\alpha^{2}+\alpha(b_{22}-a_{22}))X^{2}=Q_{4}+sQ_{3}.\end{split}

To prove (27b), we show that the factors for $Q_{3},\ldots,Q_{m+2}$ are unchanged, i.e., $\hat{A}_{3}=A_{3},\ldots,\hat{A}_{m+2}=A_{m+2}$ and $\hat{B}_{3}=B_{3},\ldots,\hat{B}_{m+2}=B_{m+2}$ . For the first factor equality, we have


(29a)	$\displaystyle\hat{A}_{5}$	$\displaystyle=$	$\displaystyle a_{32}Q_{2}+\hat{a}_{33}Q_{3}+a_{34}\hat{Q}_{4}$
(29b)		$\displaystyle=$	$\displaystyle a_{32}Q_{2}+a_{33}Q_{3}-a_{34}sQ_{3}+a_{34}Q_{4}+a_{34}sQ_{3}=A_{5},$

where we have used (25a) and (27a) with $i=3$ , in the second equality. The relation $\hat{B}_{5}=B_{5}$ can be shown analogously using (25b) and (27a). By induction, we can prove the corresponding factor relation for $i=4,\ldots,m$ . Consequently, we have $\hat{Q}_{i+2}=\hat{A}_{i+2}\hat{B}_{i+2}=A_{i+2}B_{i+2}=Q_{i+2}$ for $i=3,\ldots,m,$ which proves (27b). The theorem conclusion (26) follows by the same construction as (29). ∎

For a given evaluation scheme satisfying $a_{23}=b_{23}=1$ , we can apply the previous theorem with $\alpha=b_{22}$ , resulting in $\hat{b}_{22}=0$ . This shows that we can assume $b_{22}=0$ for evaluation schemes that are unreduced in the first two rows, without loss of generality. For the second multiplication, under these assumptions, we get

(30)

Q_{4}=(a_{22}Q_{2}+Q_{3})Q_{3}=a_{22}Q_{2}Q_{3}+Q_{3}^{2}=a_{22}X^{3}+X^{4}.

Next, we state and prove a theorem for the third row of the coefficient matrices. For this we assume that the first three rows are unreduced, i.e., $a_{23}=a_{34}=b_{23}=b_{34}=1$ . The theorem is based on perturbing each of the free elements in the third row of $A$ and $B$ , while simultaneously preserving the final output polynomial. In particular, we use perturbations with the following structure:


$\displaystyle\hat{a}_{32}$	$\displaystyle=$	$\displaystyle a_{32}+\alpha,$
$\displaystyle\hat{b}_{32}$	$\displaystyle=$	$\displaystyle b_{32}-\alpha,$
$\displaystyle\hat{a}_{33}$	$\displaystyle=$	$\displaystyle a_{33}+\beta,$
$\displaystyle\hat{b}_{33}$	$\displaystyle=$	$\displaystyle b_{33}-\beta.$

It turns out that when the perturbations have this particular structure, $Q_{5}$ is modified by the addition of a linear combination of $Q_{3}$ , $Q_{4}$ and $X^{3}$ . This is advantageous because we can compensate for $Q_{3}$ and $Q_{4}$ , since we have access to these matrices directly. Moreover, we can ensure that the $X^{3}$ -coefficient modification is zero by placing an additional condition on $\alpha$ and $\beta$ , encoded in the equality $z(\alpha,\beta)=0$ , where $z$ is a function explicitly given in the theorem.

Theorem 8.

Let $p$ be the polynomial associated with the constant-free triplet $(A,B,c)\in\mathbb{C}^{m\times(m+1)}\times\mathbb{C}^{m\times(m+1)}\times\mathbb{C}^{m+2}$ satisfying $a_{23}=a_{34}=b_{23}=b_{34}=1$ , and let $\hat{p}$ be the polynomial associated with $(\hat{A},\hat{B},\hat{c})$ where


(32a)	$\displaystyle\hat{A}$	$\displaystyle=$	$\displaystyle\begin{bmatrix}0&1&&&&&\\ 0&a&a&&&&\\ 0&a+\alpha&a+\beta&a&&&\\ 0&a&\hat{a}_{4,3}&\hat{a}_{4,4}&\ddots&&\\ \vdots&\vdots&\vdots&\vdots&\ddots&a&\\ 0&a&\hat{a}_{m,3}&\hat{a}_{m,4}&\cdots&a&a&\end{bmatrix}$
(32b)	$\displaystyle\hat{B}$	$\displaystyle=$	$\displaystyle\begin{bmatrix}0&1&&&&&\\ 0&0&b&&&&\\ 0&b-\alpha&b-\beta&b&&&\\ 0&b&\hat{b}_{4,3}&\hat{b}_{4,4}&\ddots&&\\ \vdots&\vdots&\vdots&\vdots&\ddots&b&\\ 0&b&\hat{b}_{m,3}&\hat{b}_{m,4}&\cdots&b&b&\end{bmatrix}$

and

(33)

\hat{c}=\left[\begin{array}[]{c c c c c c c}c_{1}&c_{2}&\hat{c}_{3}&\hat{c}_{4}&c_{5}&\cdots&c_{m+2}\end{array}\right].

The modifications to the coefficients are given by


(34a)	$\displaystyle\hat{a}_{i,3}$	$\displaystyle=$	$\displaystyle a_{i,3}-a_{i,5}s_{1},\quad i=4,\ldots,m,$
(34b)	$\displaystyle\hat{a}_{i,4}$	$\displaystyle=$	$\displaystyle a_{i,4}-a_{i,5}s_{2},\quad i=4,\ldots,m,$
(34c)	$\displaystyle\hat{b}_{i,3}$	$\displaystyle=$	$\displaystyle b_{i,3}-b_{i,5}s_{1},\quad i=4,\ldots,m,$
(34d)	$\displaystyle\hat{b}_{i,4}$	$\displaystyle=$	$\displaystyle b_{i,4}-b_{i,5}s_{2},\quad i=4,\ldots,m,$
(34e)	$\displaystyle\hat{c}_{3}$	$\displaystyle=$	$\displaystyle c_{3}-c_{5}s_{1},$
(34f)	$\displaystyle\hat{c}_{4}$	$\displaystyle=$	$\displaystyle c_{4}-c_{5}s_{2},$

where


(35a)	$\displaystyle s_{1}$	$\displaystyle=$	$\displaystyle\alpha(b_{32}-a_{32})-\alpha^{2},$
(35b)	$\displaystyle s_{2}$	$\displaystyle=$	$\displaystyle\beta(b_{33}-a_{33})-\beta^{2}.$

Then,

(36)

p=\hat{p}

for any $\alpha$ and $\beta$ satisfying

(37)

z(\alpha,\beta)\mathrel{\mathop{:}}=\alpha(b_{33}-a_{33})+\beta(b_{32}-a_{32})-2\alpha\beta-s_{2}a_{22}=0.

Proof.

It follows from the first two rows that $\hat{Q}_{3}=Q_{3}=X^{2}$ , $\hat{Q}_{4}=Q_{4}=a_{22}X^{3}+X^{4}$ . The proof is based on establishing the following two statements:


(38a)	$\displaystyle\hat{Q}_{5}$	$\displaystyle=$	$\displaystyle Q_{5}+s_{1}Q_{3}+s_{2}Q_{4},$
(38b)	$\displaystyle\hat{Q}_{i+2}$	$\displaystyle=$	$\displaystyle Q_{i+2},\quad i=4,\ldots,m.$

This, together with the definition of $\hat{c}$ in (34e) and (34f) implies that the theorem conclusion (36) holds.

To prove (38a) we express $\hat{Q}_{5}$ in terms of the multiplication factors and substitute the definition of the modified coefficients. We obtain

(39)

\begin{split}\hat{Q}_{5}&=\hat{A}_{5}\hat{B}_{5}\\ &=((a_{32}+\alpha)Q_{2}+(a_{33}+\beta)Q_{3}+Q_{4})((b_{32}-\alpha)Q_{2}+(b_{33}-\beta)Q_{3}+Q_{4})\\ &=\big{(}A_{5}+(\alpha Q_{2}+\beta Q_{3})\big{)}\big{(}B_{5}-(\alpha Q_{2}+\beta Q_{3})\big{)}.\end{split}

This expression can be simplified by using $Q_{5}=A_{5}B_{5}$ :

(40)

\begin{split}\hat{Q}_{5}&=Q_{5}+(B_{5}-A_{5})(\alpha Q_{2}+\beta Q_{3})-(\alpha Q_{2}+\beta Q_{3})^{2}.\end{split}

The next step is to show that the difference between $Q_{5}$ and $\hat{Q}_{5}$ is a linear combination of $Q_{3}$ , $Q_{4}$ , and $X^{3}$ , where the $X^{3}$ -coefficient is given by $z(\alpha,\beta)$ , which is zero by assumption. We factorize the expression in (40) to obtain

(41)

\begin{split}\hat{Q}_{5}&=Q_{5}+\big{(}B_{5}-A_{5}-(\alpha Q_{2}+\beta Q_{3})\big{)}\big{(}\alpha Q_{2}+\beta Q_{3}\big{)}.\end{split}

Before proceeding we define

(42)

\begin{split}d_{32}:=b_{32}-a_{32},\\ d_{33}:=b_{33}-a_{33}.\end{split}

This allows us to describe the difference between the multiplication factors more compactly:

(43)

\begin{split}B_{5}-A_{5}&=(b_{32}Q_{2}+b_{33}Q_{3}+Q_{4})-(a_{32}Q_{2}+a_{33}Q_{3}+Q_{4})\\ &=(b_{32}-a_{32})Q_{2}+(b_{33}-a_{33})Q_{3}.\\ &=d_{32}Q_{2}+d_{33}Q_{3}.\end{split}

We substitute this expression into (41) and simplify the first factor of the difference $\hat{Q}_{5}-Q_{5}$ such that

(44)

\begin{split}\hat{Q}_{5}&=Q_{5}+\big{(}d_{32}Q_{2}+d_{33}Q_{3}-\alpha Q_{2}-\beta Q_{3}\big{)}\big{(}\alpha Q_{2}+\beta Q_{3}\big{)}\\ &=Q_{5}+\big{(}(d_{32}-\alpha)Q_{2}+(d_{33}-\beta)Q_{3}\big{)}\big{(}\alpha Q_{2}+\beta Q_{3}\big{)}.\\ \end{split}

Next, we expand the final expression

(45)

\begin{split}\hat{Q}_{5}&=Q_{5}+\alpha(d_{32}-\alpha)Q_{2}^{2}+\big{(}\alpha(d_{33}-\beta)+\beta(d_{32}-\alpha)\big{)}Q_{2}Q_{3}+\beta(d_{33}-\beta)Q_{3}^{2}\\ &=Q_{5}+s_{1}Q_{2}^{2}+(\alpha d_{33}+\beta d_{32}-2\alpha\beta)Q_{2}Q_{3}+s_{2}Q_{3}^{2},\\ \end{split}

where we have simplified in the last equality using (35a) and (35b). By using $Q_{3}=Q_{2}^{2}$ , $Q_{2}Q_{3}=X^{3}$ and $Q_{3}^{2}=X^{4}$ , we can rewrite this as

(46)

\hat{Q}_{5}=Q_{5}+s_{1}Q_{3}+(\alpha d_{33}+\beta d_{32}-2\alpha\beta)X^{3}+s_{2}X^{4}.

Finally, we use (30) in order to express $X^{4}$ in terms of $Q_{4}$ and $X^{3}$

(47)

\begin{split}\hat{Q}_{5}&=Q_{5}+s_{1}Q_{3}+(\alpha d_{33}+\beta d_{32}-2\alpha\beta)X^{3}+s_{2}(Q_{4}-a_{22}X^{3})\\ &=Q_{5}+s_{1}Q_{3}+\big{(}\alpha d_{33}+\beta d_{32}-2\alpha\beta-s_{2}a_{22}\big{)}X^{3}+s_{2}Q_{4}\\ &=Q_{5}+s_{1}Q_{3}+z(\alpha,\beta)X^{3}+s_{2}Q_{4}\\ &=Q_{5}+s_{1}Q_{3}+s_{2}Q_{4},\end{split}

where we have used that $z(\alpha,\beta)=0$ in the last equality. This proves statement (38a).

To prove (38b) we show that the multiplication factors are unchanged, i.e., $\hat{A}_{6}=A_{6},\ldots\hat{A}_{m+2}=A_{m+2}$ and $\hat{B}_{6}=B_{6},\ldots\hat{B}_{m+2}=B_{m+2}$ . For the first factor we have

(48)

\begin{split}\hat{A}_{6}&=(a_{42}Q_{2}+\hat{a}_{43}Q_{3}+\hat{a}_{44}Q_{4}+a_{45}\hat{Q}_{5})\\ &=a_{42}Q_{2}+(a_{43}-a_{45}s_{1})Q_{3}+(a_{44}-a_{45}s_{2})Q_{4}+a_{45}Q_{5}+a_{45}(s_{1}Q_{3}+s_{2}Q_{4})\\ &=A_{6}-a_{45}s_{1}Q_{2}-a_{45}s_{2}Q_{4}+a_{45}(s_{1}Q_{3}+s_{2}Q_{4})\\ &=A_{6},\end{split}

where we have used (34a), (34b) and (38a) in the second equality. The relation $\hat{B}_{6}=B_{6}$ follows analogously using (34c), (34d) and (38a).

The corresponding relation can be shown for all factors using induction. Consequently, we have

(49)

\hat{Q}_{i+2}=\hat{A}_{i+2}\hat{B}_{i+2}=A_{i+2}B_{i+2}=Q_{i+2},\quad i=4,\ldots,m,

which proves (38b). The theorem conclusion (36) follows by the same construction as (48). ∎

Free variables in Theorem 8

Note that Theorem 8 includes a scalar-valued condition, $z(\alpha,\beta)=0$ , involving two scalar variables. Let $d_{32}=b_{32}-a_{32}$ and $d_{33}=b_{33}-a_{33}$ be defined as in the proof of the theorem. If we let $\alpha$ be given, we get a quadratic equation in $\beta$ :

(50)

d_{33}\alpha=a_{22}\beta^{2}+\beta\left(2\alpha+d_{32}-a_{22}d_{33}\right).

When $a_{22}\neq 0$ , the solution $\beta$ to the equation is explicitly available from the solution of the quadratic equation. This has a disadvantage of introducing a square root operation. In a real setting, this can yield complex coefficients.

This is related to the result in paper [24] as follows. The results suggest several approaches to evaluate polynomials with a low number multiplications. For the case $m=3$ , the formulas [24, Eq. (31)] involve a square root, and indeed that approach can be derived from the above transformation with $\alpha=-a_{32}$ and solving for $\beta$ .

Suppose $\beta$ is given. Then, the solution for $\alpha$ can be expressed as

(51)

\alpha=\frac{a_{22}\beta^{2}+(d_{32}-a_{22}d_{33})\beta}{d_{33}-2\beta}

with the condition $\beta\neq\frac{1}{2}d_{33}$ . To reframe this condition in terms of entries in matrices $A$ and $B$ , we use a change of variables that essentially generalizes [21] (where it is given for $m=3$ ):

(52)

\beta=\frac{1}{2}d_{33}+r,\quad r\neq 0.

With this choice, we obtain updated table entries:

(53)

\begin{split}\hat{a}_{33}&=a_{33}+\beta=a_{33}+\frac{b_{33}-a_{33}}{2}+r=\frac{a_{33}+b_{33}}{2}+r,\\ \hat{b}_{33}&=b_{33}-\beta=b_{33}-\frac{b_{33}-a_{33}}{2}-r=\frac{a_{33}+b_{33}}{2}-r.\end{split}

This leads to the relation

(54)

\hat{a}_{33}-\hat{b}_{33}=2r.

Consequently, any evaluation scheme that is unreduced in the first three rows can be transformed into an equivalent scheme satisfying

(55)

b_{33}=a_{33}+1,

since we can choose any nonzero $r$ . Thus, when considering evaluation schemes that are unreduced in the first three rows, we can assume without loss of generality that $b_{33}=a_{33}+1$ . This assumption allows us to reduce the number of variables without introducing additional constraints, as explained in the following section.

4 Minimality of parameterization

4.1 Minimality of unreduced evaluation schemes

As a consequence of the equivalence theorems in the previous section, we can conclude that any unreduced evaluation scheme can be assumed to have the form

(56)

[A|B]=\left[\begin{array}[]{ccccccc|cccccccc}0&1&&&&&&0&1&&&&\\ 0&a&1&&&&&0&0&1&&&\\ 0&a&a_{33}&1&&&&0&b&{a_{33}+1}&1&&\\ 0&a&a&a&1&&&0&b&b&b&1&\\ \vdots&\vdots&\vdots&\vdots&\ddots&\ddots&&\vdots&\vdots&\vdots&\vdots&\ddots&\ddots\\ 0&a&a&a&\cdots&a&1&0&b&b&b&\cdots&b&1\\ \end{array}\right],

with an arbitrary $c$ -vector. The dimension of $\Pi_{2^{m}}^{*}$ is determined from the rank of the Jacobian for a generic element in $\Pi_{2^{m}}^{*}$ . A generic element in $\Pi_{2^{m}}^{*}$ is unreduced; hence, the dimension is bounded by the number of free variables in a parameterization of the unreduced schemes. The parameterization (56) contains $m^{2}$ parameters and, therefore, similar to (7), we have the following.

Corollary 9.

For $m>2$ , we have

\dim(\Pi^{*}_{2^{m}})\leq m^{2}.

It turns out that this bound is sharp; i.e., we can establish a lower bound which is also $m^{2}$ . To this end it is sufficient to find a triplet $(A,B,c)$ such that the Jacobian of the output $p(X)$ with respect to the free variables has rank $m^{2}$ . Many trivial choices of unreduced $(A,B,c)$ , e.g., a triplet corresponding to only a sequence of squaring operations, lead to a points in $\Pi_{2^{m}}^{*}$ whose rank is less than $m^{2}$ . The triplet with a full Jacobian rank and leading to the simplest formulas for the Jacobian that we could find is the following:

(57)

[A|B]=\left[\begin{array}[]{cccccc|cccccc}0&1&&&&&0&1&&&&\\ 0&1&1&&&&0&1&1&&&\\ 0&1&0&1&&&0&1&1&1&&\\ \vdots&\vdots&\vdots&\ddots&\ddots&&\vdots&\vdots&\vdots&\ddots&\ddots&\\ 0&1&0&\cdots&0&1&0&1&1&\cdots&1&1\\ \end{array}\right]

and $c=[0,\ldots,0,1]$ .

In order to bound the rank, it is sufficient to find $m^{2}$ linearly independent columns in the Jacobian. A sufficient condition for linear independence for polynomials is that they have distinct degrees. Based on that reasoning, we now establish linear combinations of partial derivatives of the output of the evaluation, i.e., $p(X)$ , with respect to the elements of $A,B$ and $c$ . We obtain $m^{2}$ distinct degrees.

Although the choice (57) leads to the simplest derivation we could establish analytically, it is admittedly not very simple. The proof of the general case can be found in the appendix. For illustration, we sketch the derivation for $m=4$ , specifying the partial derivatives, which can be computed with symbolic computation tool (e.g., those described in the next section). In the full proof in the appendix we provide the details without such tools. We base the proof, as well as this sketch on a separation into cases, each case leading to a polynomial degrees that are distinct, and adding up to $m^{2}=16$ different degrees.

Case 1 corresponds to forming derivatives with respect to elements of $c$ : $\frac{\partial p}{\partial c_{1}}=1,\ \frac{\partial p}{\partial c_{2}}=X,\ \frac{\partial p}{\partial c_{3}}=X^{2},\ \frac{\partial p}{\partial c_{4}}=Q_{4},\ \frac{\partial p}{\partial c_{5}}=Q_{5},\ \frac{\partial p}{\partial c_{6}}=Q_{6}$ . Since $(A,B,c)$ in (57) is unreduced, $Q_{4},Q_{5}$ and $Q_{6}$ have maximal degree and we have that

(58)

\deg\left(\frac{\partial p}{\partial c_{i}}\right)=\deg(Q_{i})=2^{i-2},i=2,\ldots,m+2.

Case 2 corresponds to forming derivatives with respect to $a_{i,j}$ . For example, we have

(59)		$\displaystyle\frac{\partial p}{\partial a_{4,4}}$	$\displaystyle=$	$\displaystyle X^{12}+6X^{11}+16X^{10}+26X^{9}+\mathcal{O}(X^{8})$
(60)		$\displaystyle\frac{\partial p}{\partial a_{4,2}}$	$\displaystyle=$	$\displaystyle X^{9}+4X^{8}+7X^{7}+8X^{6}+7X^{5}+\mathcal{O}(X^{4}).$

As shown in the proof of the theorem, the general formula for the degrees in Case 2 is

(61)

\deg\left(\frac{\partial p}{\partial a_{i,j}}\right)=2^{m}-2^{i-1}+2^{j-2},\textrm{ for }i=2,\ldots,m,\;\;j=2,\ldots,i

Case 3 stems from the observation that we obtain the same degree if we differentiate with respect to $a_{i,j}$ and $b_{i,j}$ . Therefore we form the difference in order to obtain a different degree. For example,

(62)

\frac{\partial p}{\partial b_{4,2}}=X^{9}+4X^{8}+7X^{7}+8X^{6}+6X^{5}+\mathcal{O}(X^{4})

and the difference with (60) is

(63)

\frac{\partial p}{\partial b_{4,2}}-\frac{\partial p}{\partial a_{4,2}}=-X^{5}-2X^{4}-2X^{3}.

The degree $5$ is distinct from the degrees in Cases 1 and 2. In the appendix we prove that the general formula for the degrees in Case 3 is

(64)

\deg\left(\frac{\partial p}{\partial b_{i,j}}-\frac{\partial p}{\partial a_{i,j}}\right)=2^{m}-2^{i}+2^{i-2}+2^{j-2},\textrm{ for }i=3,\ldots,m,\;\;j=2,\ldots,i-1

and that they are distinct from previous cases.

Case 4 is based on the observation that if we use the idea from Case 3 for $j=i=m$ , we get a degree which already included in Case 1. In order to establish a degree not present in the previous cases, we must form a linear combination of partial derivatives with respect to the elements $c_{m+1}$ , $a_{m,m}$ , $b_{m,m}$ , $a_{m,2}$ and $b_{m,2}$ . Consider the following identities:

(65)	$\displaystyle\frac{\partial p}{\partial b_{4,4}}$	$\displaystyle=$	$\displaystyle X^{12}+6X^{11}+16X^{10}+26X^{9}+\mathcal{O}(X^{8})$
(66)	$\displaystyle\frac{\partial p}{\partial b_{4,4}}-\frac{\partial p}{\partial a_{4,4}}$	$\displaystyle=$	$\displaystyle-X^{8}-4X^{7}-7X^{6}-6X^{5}+\mathcal{O}(X^{4})$
(67)	$\displaystyle\frac{\partial p}{\partial c_{5}}$	$\displaystyle=$	$\displaystyle X^{8}+4X^{7}+7X^{6}+8X^{5}+\mathcal{O}(X^{4}).$

By forming the sum of equations (66) and (67), we obtain

\frac{\partial p}{\partial b_{4,4}}-\frac{\partial p}{\partial a_{4,4}}+\frac{\partial p}{\partial c_{5}}=2X^{5}+4X^{4}+3X^{3}+X^{2}.

Noting from (63) that this degree coincides with Case 3 for $i=4$ and $j=2$ , and we can reduce the degree by forming the sum

\frac{\partial p}{\partial b_{4,4}}-\frac{\partial p}{\partial a_{4,4}}+\frac{\partial p}{\partial c_{5}}+2\left(\frac{\partial p}{\partial b_{4,2}}-\frac{\partial p}{\partial a_{4,2}}\right)=-X^{3}+X^{2}.

The resulting degree 3 is distinct from those in Cases 1-3.

From the reasoning above, in this example, we have 6 distinct degrees from Case 1, 6 distinct degrees from Case 2, 3 distinct degrees from Case 3 and 1 degree from Case 4, which yield a total of $m^{2}=16$ distinct degrees.

In the general case, i.e., when $m>4$ , we also provide a Case 5, leading to an additional to $m-4$ distinct degrees. The idea for this case is very similar to that of Case 4, and corresponds to forming a linear combination of partial derivatives with respect to the elements $a_{i,i}$ , $b_{i,i}$ $a_{i+1}$ , $a_{i,2}$ and $b_{i,2}$ for $i<m$ .

Theorem 10.

For $m>2$ , we have

(68)

\dim(\Pi^{*}_{2^{m}})=m^{2}.

Proof.

See Section A. ∎

As a consequence of Theorem 10, the parameterization (56) is minimal. To our knowledge, this is the first minimal parameterization of unreduced schemes in $\Pi_{2^{m}}^{*}$ .

4.2 Reduced evaluation schemes

In practice, unreduced evaluation schemes are rarely useful for evaluating a given polynomial of high degree $d$ , because of the limited number of degrees of freedom in $\Pi_{2^{m}}^{*}$ for large $m$ , in comparison to $\dim(\Pi_{2^{m}})=d+1=2^{m}+1$ . If the Hessenberg matrices $A$ and $B$ are reduced, we obtain output polynomials of lower degree. For example the pair

(69)

[A|B]=\left[\begin{array}[]{ccccccc|cccccccc}0&1&&&&&&0&1&&&&\\ 0&\times&1&&&&&0&0&1&&&\\ 0&\times&\times&1&&&&0&\times&\times&1&&\\ 0&\times&\times&\times&1&&&0&\times&\times&1&0&\\ 0&\times&\times&\times&\times&1&&0&\times&\times&\times&1&0\\ 0&\times&\times&\times&\times&\times&1&0&\times&\times&\times&\times&1&0\\ \end{array}\right]

corresponds to $m=6$ multiplications but results in a polynomial of degree $32$ . By a single reduction, we mean setting the last nonzero element in a row to zero in either $A$ or $B$ . With $r$ reductions we mean the repeated application of a single reduction. Note that we can still normalize each row since the transformation theorems are also applicable to reduced systems. Hence, we lose one degree of freedom with every reduction, and due to Corollary 9, we expect the corresponding dimension to be

(70)

m^{2}-r.

In the following section we proceed by studying reduced evaluation schemes. More precisely, we study reduced evaluation schemes that lead to specific polynomial degrees and describe ways to compute $(A,B,c)$ for that reduction structure for a polynomial given in a monomial basis.

Figure 1: The polynomial degrees when the Hessenberg matrices in the triplet

(A,B,c)

are reduced Hessenberg matrices. The parameter

\epsilon=0.1

is selected for visualization purposes. Any point above the blue curve can be discarded to not completely contain the corresponding polynomial subset, in the sense of (71), due to an insufficient number of degrees of freedom.

5 Polynomial subsets

This section is devoted to the study of the question: What is the largest $d$ such that all $d$ -degree polynomials can be computed with $m$ multiplications? Formally, we use two versions of this problem

(71)

\max\{d:\Pi_{d}\subset\Pi_{2^{m}}^{*}\}\leq\max\{d:\Pi_{d}\subset\widebar{\Pi_{2^{m}}^{*}}\}.

We consider the left-hand side when possible, and otherwise study the right-hand side in order to avoid limit cases similar to (5).

The previous section stressed the use of reduced matrices. For a given reduction, we can compute the corresponding polynomial degree. Hence, we can investigate candidate solutions to (71) by considering all possible reductions. This approach is depicted in Figure 1, which illustrates all combinations of reductions for up to $m\leq 7$ multiplications and $r\leq 7$ reductions. For instance, (69) corresponds to the scenario (horizontal axis) with $m=6$ multiplications and $r=3$ reductions and achieves a polynomial degree of $32$ , as shown on the vertical axis of the figure.

To identify optimal solutions to (71), higher polynomial degrees are advantageous. However, excessively high degrees may result in insufficient degrees of freedom. More precisely, (70) gives a bound on the degree of the admissible candidate solutions to (71) for a given number of multiplications and reductions. This is visualized with a blue line in the figure. Hence, for the purpose of studying (71) we can disregard points above this line.

The problem (71) becomes increasingly complex as $m$ increases. For $m=3$ , the problem is essentially already solved in [24] since an explicit procedure is provided to compute almost all polynomials of degree $8$ with $m=3$ multiplications, i.e.,

\max\{d:\Pi_{d}\subset\widebar{\Pi_{2^{3}}^{*}}\}=8.

For $m=4$ , similar constructions are also provided in [24], yielding a method for degree $12$ . An alternate approach requiring fewer assumptions on the monomial coefficients is given in Section 5.1. From Figure 1, we see that this is the highest admissible degree and therefore conclude that it is optimal.

For $m\geq 5$ we were not able to solve the problem analytically and instead resorted to computational tools. More precisely, we frame the problem with a given reduction and structure as a system of polynomial equations. For $m=5$ , we use the package HomotopyContinuation.jl [4] to find, as far as we can tell, all solutions. For $m=6$ and $m=7$ , the system was too complicated, and we were not able to find solutions with this package. However, by using the software [14], we were able to construct locally convergent iterative methods and successfully found solutions in all specified test cases. These simulations combined with reasoning based on admissible degrees in Figure 1, led us to state conjectures concerning the solution to (71). For reproducibility, all simulations (including starting values) are provided in the publicly available GitHub repository: https://github.com/GustafLorentzon/polynomial-set-paper.

5.1 Four multiplications

When we study $m=4$ , we identify from Figure 1 that the highest degree of the admissible polynomials is $12$ , since without reductions we only have $16$ degrees of freedom, which cannot parameterize $\Pi_{16}$ . The solution to (71) is indeed 12. This can already be concluded from the method in [24, p. 237] which is a method to evaluate polynomials of degree 12, given in their monomial basis, using only $m=4$ multiplications. In our terminology this corresponds to the reduction $a_{2,3}=0$ and, additionally, $a_{4,2}=0$ . The method in [24, p. 237] involves square roots of expression containing the monomial coefficients, roots of a polynomial of degree four, as well as divisions of certain quantities leading to exceptions corresponding to some limit cases. Therefore, from [24] we conclude that

(72)

\max\{d:\Pi_{d}\subset\widebar{\Pi_{2^{4}}^{*}}\}=12

in a complex sense.

We now present a slightly more general method for evaluating any polynomial in $\Pi_{12}$ with four multiplications. The new method does not involve square roots or divisions except for the leading monomial coefficient. Consider the evaluation schemes with the following structure


(73f)	$\displaystyle\left[\begin{array}[]{c \| c}A&B\end{array}\right]$	$\displaystyle=\left[\begin{array}[]{c c c c c \| c c c c c}0&1&&&&0&1&&\\ 0&1&0&&&0&0&1&\\ 0&a_{32}&a_{33}&1&&0&0&0&1\\ 0&a_{42}&a_{43}&a_{44}&1&0&b_{42}&b_{43}&a_{44}+1&1\end{array}\right]$
(73h)	$\displaystyle c$	$\displaystyle=\left[\begin{array}[]{cccccc}c_{1}&c_{2}&c_{3}&c_{4}&c_{5}\end{array}\right].$

Suppose $\alpha_{0},\ldots,\alpha_{12}$ represent a given polynomial $p(X)=\alpha_{0}I+\alpha_{1}X+\cdots+\alpha_{12}X^{12}\in\Pi_{12}$ . If we expand the parameterization, we obtain a multivariate polynomial system of equations—one for each monomial coefficient in the output polynomial. In the terminology of Section 2.2, the system corresponds to considering the $0$ th, $\ldots$ , $12$ th derivatives of the equation $\Phi(A,B,c)(x)=p(x)$ with respect to $x$ , evaluated at $x=0$ . In this case, we have $13$ equations in $13$ variables. To solve solve this system, we first introduce the auxiliary variables $\beta_{43}=b_{43}+a_{43}$ and $\beta_{42}=b_{42}+a_{42}$ . This system is explicitly solvable by considering the equations in the output polynomial ordered in descending degree, so that the equation corresponding to $\alpha_{12}$ is treated first. In this sense, the system is triangular, which was also crucial for the construction in [24, Section 3]. The solution to the system is given by the following sequence of equations:


(74a)	$\displaystyle c_{6}$	$\displaystyle=\alpha_{12}$
(74b)	$\displaystyle a_{33}$	$\displaystyle=\frac{1}{2}\left(\frac{\alpha_{11}}{c_{6}}\right)$
(74c)	$\displaystyle a_{32}$	$\displaystyle=\frac{1}{2}\left(\frac{\alpha_{10}}{c_{6}}-a_{33}^{2}\right)$
(74d)	$\displaystyle a_{44}$	$\displaystyle=\frac{1}{2}\left(\frac{\alpha_{9}}{c_{6}}-2a_{32}a_{33}-1\right)$
(74e)	$\displaystyle\beta_{43}$	$\displaystyle=\frac{\alpha_{8}}{c_{6}}-\left(a_{33}+2a_{33}a_{44}+a_{32}^{2}\right)$
(74f)	$\displaystyle\beta_{42}$	$\displaystyle=\frac{\alpha_{7}}{c_{6}}-\left(a_{32}+a_{33}\beta_{43}+2a_{32}a_{44}\right)$
(74g)	$\displaystyle c_{5}$	$\displaystyle=\alpha_{6}-c_{6}\left(a_{44}+a_{44}^{2}+a_{33}\beta_{42}+a_{32}\beta_{43}\right)$
(74h)	$\displaystyle a_{43}$	$\displaystyle=\frac{c_{5}}{c_{6}}-\left(a_{33}\frac{c_{5}}{c_{6}}+a_{44}\beta_{43}+a_{32}\beta_{42}\right)$
(74i)	$\displaystyle a_{42}$	$\displaystyle=\frac{\alpha_{4}}{c_{6}}-\left(a_{32}\frac{c_{5}}{c_{6}}+a_{44}\beta_{42}+a_{43}\beta_{43}-a_{43}^{2}\right)$
(74j)	$\displaystyle c_{4}$	$\displaystyle=\alpha_{3}-c_{6}\left(a_{43}\beta_{42}+a_{42}\beta_{43}-2a_{42}a_{43}\right)$
(74k)	$\displaystyle c_{3}$	$\displaystyle=\alpha_{2}-c_{6}\left(a_{42}\beta_{42}-a_{42}^{2}\right).$

With the conditions $c_{2}=\alpha_{1}$ , $c_{1}=\alpha_{0}$ , $b_{43}=\beta_{43}-a_{43}$ and $b_{42}=\beta_{42}-a_{42}$ , we have explicitly computed all variables in (73).

Recall that $\alpha_{12}\neq 0$ for $p\in\Pi_{12}$ ; therefore, we have made no assumptions other than the degree of the polynomial. Moreover, the formulas preserve the algebraic structure of the variables, e.g., if $\alpha_{0},\ldots,\alpha_{12}\in\mathbb{R}$ , then $(A,B,c)$ is a real triplet, so the evaluation coefficients are real. We conclude that

(75)

\max\{d:\Pi_{d}\subset\Pi_{2^{4}}^{*}\}=12

holds in both a real and a complex sense.

5.2 Five multiplications

For $m=5$ multiplications we see in Figure 1 that the highest degree of the admissible polynomials is $20$ . To our knowledge, the state of the art is $d=18$ as given in [25, Equation (17)-(19)] with $s=2$ which is based on [24] combined with the Paterson–Stockmeyer evaluation. In our terminology, that approach corresponds to the reduction $a_{23}=a_{56}=0$ and additionally imposing $a_{42}=0$ . The reduction in Figure 1 leading to a polynomial of degree $d=20$ corresponds to $a_{45}=a_{56}=0$ .

We were not able to explicitly derive a solution to the multivariate polynomial system for the structure with $d=20$ analytically; instead, we needed to resort to computational tools. The Julia package HomotopyContinuation.jl [4] includes methods to solve polynomial systems of equations based on numerical continuation and with advanced initialization of starting points for the homotopy method. We implemented the system equations derived from considering each monomial coefficient for the data structures of this package. Since we have more variables than equations for this structure, we empirically fixed some variables, choosing which variable to fix by trying to reduce the total degree of the system as much as possible. We additionally solved those equations that could be solved explicitly, e.g., the first and last equations.

With this setup, we were able to compute $(A,B,c)$ for a large number of given polynomials $p\in\Pi_{20}$ , including the truncated Taylor expansion of the exponential as well as the function $1/(1-x)$ . The simulations were done in Julia, and the code is given in the GitHub repository. Moreover, code to actually evaluate polynomials is provided in both Julia and Matlab. For conciseness, we report only the numbers for the matrix exponential in the following.

In an attempt to prevent large condition numbers, we sought solutions that did not have excessively large or small values. In this case, we mitigated large numbers by scaling the input. For the exponential, we approximate $e^{\alpha x}$ with $\alpha=8$ fixed, since this makes $c_{7}=\alpha^{20}/20!\approx 0.47$ in the order of magnitude one. Although the input scaling can be reversed by transforming entries in the table, it was not deemed numerically useful and therefore it was not included in the following presentation of results.

HomotopyContinuation.jl found several solutions and the solution with the smallest values was the following


(76f)	$\displaystyle[A\|B]$	$\displaystyle=\left[\begin{array}[]{cccccc\|cccccc}0&1&&&&&0&1&&&&\\ 0&0&1&&&&0&\frac{1}{2}&1&&&\\ 0&0&2&1&&&0&b&1&1&&\\ 0&a&a&1&0&&0&b&b&b&1&\\ 0&a&a&a&1&0&0&b&b&b&b&1\end{array}\right]$
(76h)	$\displaystyle c$	$\displaystyle=\left[\begin{array}[]{cccccccc}c&c&c&c&c&c&c\end{array}\right]$

where the missing values are given in Table 1. The Jacobian of the polynomial system evaluated at this solution has a condition number of $8.1\cdot 10^{2}$ .

$a_{42}$	2.3374451754385963	$c_{1}$	$1$
$a_{43}$	$-41/16$	$c_{2}$	$\alpha$
$a_{52}$	2.8309861554443847	$c_{3}$	-6.657689892163032
$a_{53}$	8.7616118485412	$c_{4}$	50.445902670306005
$a_{54}$	5.123957592622475	$c_{5}$	19.754729172913187
$b_{32}$	1.4484649122807018	$c_{6}$	2.8090057997411706
$b_{42}$	6.389966463262669	$c_{7}$	$\alpha^{20}/20!$
$b_{43}$	6.697361614351532
$b_{44}$	2.1451472591988834
$b_{52}$	-2.458444697550387
$b_{53}$	-3.6724346694235876
$b_{54}$	16.044090747953085
$b_{55}$	7.557067023178642

Table 1: Non-specified values in (76)

Since we were able to find solutions in the case studies we conjecture that this corresponds to a realization of a method for the maximum polynomial degree.

Conjecture 11.

\max\{d:\Pi_{d}\subset\widebar{\Pi_{2^{5}}^{*}}\}=20.

The upcoming work [26] suggests that there is indeed a constructive way to form such evaluation schemes for five multiplications.

5.3 Six multiplications

To our knowledge, the state of the art for $m=6$ multiplications is $d=24$ , again given in [25] with $s=3$ . In our terminology, that corresponds to the reduction $a_{23}=a_{33}=a_{34}=a_{67}=0$ . Similar to the situation for $m=5$ , this is not the highest admissible degree. From Figure 1 we see that the highest admissible degree is $d=32$ . With $r=3$ , we can use the reduction (69).

Unfortunately, the application of the package HomotopyContinuation.jl was not successful for this case. We have $33$ equations, and $m^{2}-r=33$ unknowns. The creation of initial vectors for the homotopy continuation seems too computationally demanding, likely related to the high total degree of the polynomial system. Instead we used the package GraphMatFun.jl [14] to create a locally convergent iterative solution method. The system is highly ill-conditioned and a standard Newton approach was not successful. Instead, we employ an iteratively regularized Newton’s method. Following the approach in [6], we compute a Tikhonov–Newton step by applying Newton’s method to a Tikhonov-regularized system. The Tikhonov–Newton step can be computed using the singular value decomposition of the Jacobian matrix, which is directly available from the graph representation in GraphMatFun.jl. The best results were obtained by using Armijo step-length damping and selecting between a Newton and a Tikhonov–Newton step in a greedy fashion. For the sake of reproducibility, the starting vectors are given explicitly in the software available in the GitHub repository. In order to determine conclusively that a solution is found, the solution was post-processed with high-precision arithmetic (BigFloat in Julia) so that the first 50 decimals of the coefficients appear accurate. This was done for this particular simulation as well as for all subsequent simulations.

The above approach with the structure (69) was successful in finding a solution vector for the problem corresponding to the Taylor expansion of the matrix exponential, using complex arithmetic. Unfortunately, we were not able to find a real coefficient vector.

From an application viewpoint, real coefficients are advantageous, e.g., since the evaluation of $p(A)$ can be done in real arithmetic if $A$ is real. In order to find a real evaluation scheme, we investigated instead the structure

[A|B]=\left[\begin{array}[]{ccccccc|ccccccc}0&1&&&&&&0&1&&&&\\ 0&\times&1&&&&&0&\times&1&&&\\ 0&\times&\times&1&&&&0&\times&1&0&&\\ 0&\times&\times&\times&1&&&0&\times&\times&\times&1&\\ 0&\times&\times&\times&\times&1&&0&\times&\times&\times&1&0\\ 0&\times&\times&\times&\times&\times&1&0&\times&\times&\times&\times&1&0\\ \end{array}\right].

This leads to a polynomial of degree $d=30$ . Using the locally convergent iterative method described above, we successfully computed a real solution vector.

For the actual numbers, we refer to the software. The maximum absolute value in the coefficient vector for the complex case is $3.7\cdot 10^{5}$ , and the Jacobian evaluated at the solution has a condition number of $5.8\cdot 10^{11}$ . The maximum absolute value in the coefficient vector for the real case is $5.0\cdot 10^{4}$ , and the condition number of the Jacobian at the solution is $2.10\cdot 10^{10}$ .

The highest degree polynomial set can indeed be different for the real and complex cases, and the simulations suggest the following.

Conjecture 12 (Six multiplications).

For complex coefficients,

\max\{d:\Pi_{d}\subset{\widebar{\Pi_{2^{6}}^{*}}}\}=32.

For real coefficients,

\max\{d:\Pi_{d}\subset{\widebar{\Pi_{2^{6}}^{*}}}\}=30.

5.4 Seven multiplications

For seven multiplications, we believe that the state of the art is $d=30$ [25, Table 3] with $s=4$ , which in our context corresponds to the reduction $a_{23}=a_{33}=a_{34}=a_{43}=a_{44}=a_{45}=0$ and $a_{78}=0$ . In Figure 1 we see that the highest admissible degree appears to be $d=42$ with $r=5$ . Using the same technique as for $m=6$ , we were able to find a solution to the system for the Taylor expansion of the exponential. We used scaling $\alpha=16$ , complex arithmetic, and the following structure:

(77)

[A|B]=\left[\begin{array}[]{cccccccc|ccccccccc}0&1&&&&&&&0&1&&&\\ 0&\times&1&&&&&&0&\times&1&&\\ 0&\times&0&1&&&&&0&\times&1&0&\\ 0&\times&\times&\times&1&&&&0&\times&\times&1&0&\\ 0&\times&0&\times&\times&1&&&0&\times&\times&\times&1&0&\\ 0&\times&\times&\times&\times&\times&1&&0&\times&\times&\times&\times&1&0&\\ 0&\times&\times&\times&\times&\times&\times&1&0&\times&\times&\times&\times&\times&1&0&\\ \end{array}\right].

The fixed elements $a_{3,3}=a_{5,3}=0$ were selected empirically in an attempt to avoid very large condition numbers. The maximum absolute value in the coefficient vector is $2.8\cdot 10^{6}$ , and the condition number of the Jacobian at the solution is $2.9\cdot 10^{13}$ . From this we conclude the following in a complex sense.

Conjecture 13 (Seven multiplications).

\max\{d:\Pi_{d}\subset{\widebar{\Pi_{2^{7}}^{*}}}\}=42.

Remark 14 (Computational example).

Although the main contributions of this manuscript are theoretical in character, we also wish to point out the practical value of the simulations. For example, the output of the simulations concerning $m=7$ multiplications can be directly used to compute the matrix exponential, in a very competitive way. In fact, in theory (in the sense of number of floating point operations) the method is the fastest for large norm matrices, as far as we know. Although a complete computational study is beyond the scope of this paper, we provide (for a specific but random matrix) a comparison with the scaling-and-squaring algorithm [18], which is the most commonly method used in mathematical software for the matrix exponential. In the following we see that our approach is faster or more accurate, or both depending on viewpoint, than two valid parameter choices for the scaling-and-squaring method. What is marked as our method is the evaluation (77) with coefficients precomputed with the Tikhonov-Newton method.

julia> Random.seed!(0); setprecision(128); n=200; alpha=16;
julia> A=randn(Complex{BigFloat},n,n); A=20*A/norm(A);
julia> E1 = @btime exp16_deg42_bigfloat(A/alpha); # Our method
  58.648 s (719681983 allocations: 29.50 GiB)
julia> E2 = @btime exp_sas_6mult_1div(A); # standard version 1
  65.340 s (762205498 allocations: 31.24 GiB)
julia> E3 = @btime exp_sas_7mult_1div(A); # standard version 2
  80.014 s (860125595 allocations: 35.25 GiB)
julia> # Compute a reference solution with high precision
julia> expAref=mapreduce(i-> (A^i)/factorial(big(i)), +, 0:100);
julia> norm(expAref-E1); # Error for our method
9.25199599560488132729984839075891589943e-33
julia> norm(expAref-E2); # Error for standard implementation version 1
1.935440065066927257455720406200123063236e-30
julia> norm(expAref-E3); # Error for standard implementation version 2
1.14214602689126618911305625822908609328e-36

6 Conclusions

This work focuses on a characterization of the set $\Pi_{2^{m}}^{*}$ , with particular attention given to minimality and to computing the maximum degree polynomial subset of $\Pi_{2^{m}}^{*}$ in the sense of (71). From our perspective, the minimality question is well understood in this paper. The determination of the maximum degree polynomial subset can be further investigated. We have only described the case $m\leq 7$ using computational reasoning. Both a theoretical description of the general case, e.g., using further tools from algebraic geometry [33], and a more general computational approach could be of interest and useful in practice.

Based on our simulations, one major component is missing before this can be directly used in matrix function evaluation software: understanding the effect of rounding errors. Evaluating high-degree polynomials is, in general, prone to rounding errors—unless special representations such as a Chebyshev basis are used. In this case, the issue appears even more intricate. For example, by using high-precision arithmetic, the coefficients were computed such that we could guarantee correctness in full double precision. However, the fact that the system has a rather large condition number (at least for $m=7$ ) is an indication that this evaluation is sensitive with respect to these coefficients. Heuristics similar to [15] might be applicable in a general setting. Further work is needed to determine which evaluation schemes, in the continuum of $(A,B,c)$ , lead to better numerical stability; a necessary condition for numerical stability is that the condition number is not too large.

The Paterson–Stockmeyer method has proven beneficial not only for evaluating matrix polynomials; similar computational challenges arise in other contexts, such as when the input $X$ is a polynomial. In these scenarios, multiplying two quantities is significantly more computationally demanding than forming linear combinations. The construction in the Paterson–Stockmeyer approach resembles the baby-step giant-step (BSGS) technique introduced in [31], which has found various applications and has been combined with the Paterson–Stockmeyer approach in public key and privacy-preserving cryptography [12]. Moreover, both the Paterson–Stockmeyer method and BSGS serve as valuable tools in high-precision arithmetic [16, 32]. Open research questions include how the approach presented in this paper, or methods for $\Pi_{2^{m}}^{*}$ in general, can be applied in these contexts.

The fixed-cost computation approach presented in this paper can be complemented by insights from research on composite polynomials or deep polynomials. See [20] for composite polynomials. Rational approximations corresponding to this concept, such as those in [10], illustrate how successive compositions, e.g., $p(f(g(x)))$ , can achieve rapid convergence in terms of both the number of compositions and the parameters involved. Similar findings are noted in [34], motivated by the link between this approach and universal approximation in deep learning. The composite polynomial method can fit within the framework of this paper by zeroing certain elements in matrices $A$ and $B$ . Nevertheless, there is a significant distinction in research objectives: our objective is to minimize the number of matrix-matrix multiplications, while [34] focuses on reducing the number of parameters. Although some results, such as the approximation of the $p$ th root [10], may be directly applicable, further research is needed to fully explore the differences resulting from these objectives.

Acknowledgements

The authors wish to express gratitude for the insightful discussions and feedback from Prof. Kathlén Kohn (KTH Royal Insititute of Technology), particularly regarding Section 2.2. This research was partially conducted during the first author’s sabbatical at EPFL / University of Geneva. The support of the hosts, Prof. Daniel Kressner and Prof. Bart Vandereycken, is greatly appreciated. The authors also greatly acknowledge the valuable comments from Prof. Massimiliano Fasi (University of Leeds) and Prof. Jorge Sastre (Polytechnic University of Valencia).

References

[1] A. H. Al-Mohy and N. J. Higham. A new scaling and squaring algorithm for the matrix exponential. SIAM J. Matrix Anal. Appl., 31(3):970–989, 2010.
[2] P. Bader, S. Blanes, and F. Casas. Computing the matrix exponential with an optimized Taylor polynomial approximation. Mathematics, 7(12):1174, 2019.
[3] S. Blanes, N. Kopylov, and M. Seydaoğlu. Efficient scaling and squaring method for the matrix exponential. SIAM J. Matrix Anal. Appl., 46(1):74–93, 2025.
[4] P. Breiding and S. Timme. HomotopyContinuation.jl: A Package for Homotopy Continuation in Julia. In Int. Congr. Math. Softw, pages 458–465. Springer, 2018.
[5] R. Byers. Solving the algebraic Riccati equation with the matrix sign function. Linear Algebra Appl., 85:267–279, 1987.
[6] J. Eriksson and P.-Å. Wedin. Regularization methods for almost rank-deficient nonlinear problems. In B. Jacobsen, K. Mosegaard, and P. Sibani, editors, Inverse Methods, pages 295–302. Springer-Verlag, Berlin, 1996.
[7] E. Estrada and D. J. Higham. Network properties revealed through matrix functions. SIAM Rev., 52(4):696–714, 2010.
[8] M. Fasi. Optimality of the Paterson–Stockmeyer method for evaluating matrix polynomials and rational matrix functions. Linear Algebra Appl., 574:182–200, 2019.
[9] M. Fasi, S. Gaudreault, K. Lund, and M. Schweitzer. Challenges in computing matrix functions, 2024. Arxiv: 2401.16132.
[10] E. S. Gawlik and Y. Nakatsukasa. Approximating the $p$ th root by composite rational functions. J. Approx. Theory, 266:105577, 2021.
[11] G. Golub and C. Van Loan. Matrix Computations. The Johns Hopkins University Press, 2013. 4th edition.
[12] K. Han and D. Ki. Better bootstrapping for approximate homomorphic encryption. In Topics in Cryptology – CT-RSA 2020, pages 364–390, 2020.
[13] N. J. Higham. Functions of Matrices: Theory and Computation. SIAM, Philadelphia, 2008.
[14] E. Jarlebring, M. Fasi, and E. Ringh. Computational graphs for matrix functions. ACM Trans. Math. Softw., 48(4):39, 2023.
[15] E. Jarlebring, J. Sastre, and J. Ibáñez. Polynomial approximations for the matrix logarithm with computation graphs. Linear Algebra Appl., 2024. In press.
[16] F. Johansson. Evaluating parametric holonomic sequences using rectangular splitting. In Proc. 39th Int. Symp. Symbolic Algebraic Comput. (ISSAC ’14), pages 256–263, 2014.
[17] B. Mishra. Algorithmic Algebra. Applied Mathematical Sciences. Springer-Verlag, 1993.
[18] C. Moler and C. Van Loan. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev., 45(1):3–49, 2003.
[19] M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J. Comput., 2(1):60–66, 1973.
[20] J. F. Ritt. Prime and composite polynomials. Trans. Amer. Math. Soc., 23(1):51–66, 1922.
[21] E. H. Rubensson, G. Lorentzon, and E. Jarlebring. Recursive expansion of the matrix step function using eight-degree polynomials. In progress, 2025.
[22] E. H. Rubensson, E. Rudberg, and P. Sałek. Density matrix purification with rigorous error control. J. Chem. Phys., 128:074106, 2008.
[23] J. Sastre. Efficient mixed rational and polynomial approximation of matrix functions. Appl. Math. Computation, 218(24):11938–11946, August 2012.
[24] J. Sastre. Efficient evaluation of matrix polynomials. Linear Algebra Appl., 539:229–250, 2018.
[25] J. Sastre and J. Ibáñez. Efficient evaluation of matrix polynomials beyond the Paterson–Stockmeyer method. Mathematics, 9(14):1600, July 2021.
[26] J. Sastre, J. Ibáñez, J. M. Alonso, and E. Defez. Beyond paterson–stockmeyer: Advancing matrix polynomial computation. Presented at the 5th Int. Conf. on Applied Mathematics, Computational Science and Systems Engineering, Institut Henri Poincaré, Paris, France, Apr. 14–16, 2025.
[27] J. Sastre, J. Ibáñez, and E. Defez. Boosting the computation of the matrix exponential. Appl. Math. Computation, 340:206–220, January 2019.
[28] J. Sastre, J. Ibáñez, E. Defez, and P. Ruiz. New scaling-squaring Taylor algorithms for computing the matrix exponential. SIAM J. Sci. Comput., 37:A439–A455, 2015.
[29] J. Sastre, J. Ibáñez P. Ruiz, and E. Defez. Efficient computation of the matrix cosine. Appl. Math. Computation, 219(14):7575–7585, March 2013.
[30] I. R. Shafarevich and M. Reid. Basic algebraic geometry, volume 1. Springer, 1994.
[31] D. Shanks. Class number, a theory of factorization and genera. In Proc. Sympos. Pure Math., volume 20, pages 415–440, Providence, RI, 1971. Amer. Math. Soc.
[32] D. M. Smith. Efficient multiple-precision evaluation of elementary functions. Appl. Math. Computation, 52(185):131–134, January 1989.
[33] B. Sturmfels and M. Michalek. Invitation to Nonlinear Algebra. Amer. Math. Soc., Providence, RI, english edition, July 2021.
[34] K. Yeon. Deep univariate polynomial and conformal approximation, 2025. arXiv:2503.00698.

Appendix A Auxiliary material for the proof of Theorem 10

The proof is based on two technical lemmas. We note that several of these results hold for arbitrary choices of $(A,B,c)$ , but some parts of the statements and some of the proofs are clearer when we assume the specific structure of the triplet $(A,B,c)$ given in (57), which is sufficient for the proof of Theorem 10. Note that for our example, $Q_{j}=A_{j}B_{j}$ , $A_{j}=Q_{2}+Q_{j-1}$ , and $B_{j}=Q_{2}+\cdots+Q_{j-1}$ .

The first lemma states a formula for the degree when we apply a differentiation operator consisting of a linear combination of derivatives of the elements in $(A,B,c)$ but not of the last rows of $A$ and $B$ .

Lemma 15.

Let $(A,B,c)$ be of the specific structure given in (57). Let $\mathcal{D}_{i}$ be an operator consisting of a linear combination of the elements of $A,B$ in rows $1,2,\ldots,i$ . Assume that

\deg(\mathcal{D}_{i}Q_{3})\leq\cdots\leq\deg(\mathcal{D}_{i}Q_{i+1})<\deg(\mathcal{D}_{i}Q_{i+2})

and that $\deg(\mathcal{D}_{i}Q_{i+2})\geq 1$ . Then, $\deg(\mathcal{D}_{i}Q_{i+2})<\cdots<\deg(\mathcal{D}_{i}Q_{m+2})$ and

(78)

\deg(\mathcal{D}_{i}p)=\deg(\mathcal{D}_{i}Q_{m+2})=2^{m-1}+\cdots+2^{i}+\deg(\mathcal{D}_{i}Q_{i+2}).

Proof.

We prove the statement by induction, starting with row $i$ . By applying the product rule and using $\mathcal{D}_{i}Q_{2}=0$ , we obtain:

(79)

\mathcal{D}_{i}Q_{i+3}=(\mathcal{D}_{i}A_{i+3})B_{i+3}+A_{i+3}(\mathcal{D}_{i}B_{i+3})=\\ (\mathcal{D}_{i}Q_{i+2})C_{i+3}+A_{i+3}\left(\mathcal{D}_{i}Q_{3}+\cdots+\mathcal{D}_{i}Q_{i+1}\right)

where $C_{i+3}=A_{i+3}+B_{i+3}$ . Note that $\deg(C_{i+3})=\deg(A_{i+3})=2^{i}$ . Therefore, on the right-hand side of equation (79), the degree of the first term is larger than that of the second term, given our assumption that $\deg(\mathcal{D}_{i}Q_{3})\leq\cdots\leq\deg(\mathcal{D}_{i}Q_{i+1})<\deg(\mathcal{D}_{i}Q_{i+2})$ . Thus, we have:

(80)

\deg(\mathcal{D}_{i}Q_{i+3})=\deg(A_{i+3})+\deg(\mathcal{D}_{i}Q_{i+2})=2^{i}+\deg(\mathcal{D}_{i}Q_{i+2})>\deg(\mathcal{D}_{i}Q_{i+2}).

This establishes the base step of our induction.

To proceed with the induction step, assume that the inequality holds for $j$ steps, i.e.,

\deg(\mathcal{D}_{i}Q_{i+2})<\cdots<\deg(\mathcal{D}_{i}Q_{i+j+1}).

Analogous to the derivation in equation (79), we have:

\mathcal{D}_{i}Q_{i+j+2}=(\mathcal{D}_{i}Q_{i+j+1})C_{i+j+2}+A_{i+j+2}\left(\mathcal{D}_{i}Q_{3}+\cdots+\mathcal{D}_{i}Q_{i+j}\right).

Since $C_{i+j+2}$ and $A_{i+j+2}$ have the same degree, and by our induction assumption, the first term has a higher degree, we conclude that:

(81)

\deg(\mathcal{D}_{i}Q_{i+j+2})=\deg(C_{i+j+2})+\deg(\mathcal{D}_{i}Q_{i+j+1})=2^{i+j-1}+\deg(\mathcal{D}_{i}Q_{i+j+1}).

This completes the proof of the increasing degree progression. The relation (78) follows by applying equation (81) for $j=m-i,\ldots,2$ and using (80) once. ∎

The previous lemma (Lemma 15) gives us the degree of the partial derivative of the output, given the degree of the partial derivative of $Q_{i+2}$ . It remains to determine the derivatives of $Q_{i+2}$ . We select the differentiation operator in several ways for the purpose of later combining them to form distinct degrees of the Jacobian.

Lemma 16.

Let $(A,B,c)$ be of the specific structure given in (57). Then, for $i=2,\ldots,m$ and $j=2,\ldots,i$

(82)

\frac{\partial Q_{i+2}}{\partial a_{i,j}}=Q_{j}B_{i+2}

and for $i=3,\ldots,m$ and $j=2,\ldots,i$ ,

(83)

\left(\frac{\partial}{\partial b_{i,j}}-\frac{\partial}{\partial a_{i,j}}\right)Q_{i+2}=Q_{j}(A_{i+2}-B_{i+2})

Moreover, for $i=5,\ldots m$ ,

(84)

\left(\frac{\partial}{\partial b_{i-1,i-1}}-\frac{\partial}{\partial a_{i-1,i-1}}+2\left(\frac{\partial}{\partial b_{i-1,2}}-\frac{\partial}{\partial a_{i-1,2}}\right)+2\frac{\partial}{\partial a_{i,i}}\right)Q_{i+2}=B_{i+2}B_{i-1}Q_{2}+\mathcal{O}(X^{r}),

where $r={2^{i-1}}$ and also,

(85)

\left(\frac{\partial}{\partial b_{m,m}}-\frac{\partial}{\partial a_{m,m}}+2\left(\frac{\partial}{\partial b_{m,2}}-\frac{\partial}{\partial a_{m,2}}\right)+\frac{\partial}{\partial c_{m+1}}\right)p=Q_{2}(2Q_{2}-B_{m}).

Proof.

We prove (82) by applying the product rule:

(86)

\frac{\partial Q_{i+2}}{\partial a_{i,j}}=\frac{\partial A_{i+2}}{\partial a_{i,j}}B_{i+2}+A_{i+2}\frac{\partial B_{i+2}}{\partial a_{i,j}}=Q_{j}B_{i+2}.

Similarly,

(87)

\displaystyle\frac{\partial Q_{i+2}}{\partial b_{i,j}}=Q_{j}A_{i+2}.

Equation (83) is an immediate consequence of (86) and (87).

For notational convenience, we define the differential operator:

(88)

\mathcal{D}_{k}:=\frac{\partial}{\partial b_{k,k}}-\frac{\partial}{\partial a_{k,k}}+2\left(\frac{\partial}{\partial b_{k,2}}-\frac{\partial}{\partial a_{k,2}}\right),

for which we derive the auxiliary relation:


(89a)	$\displaystyle\mathcal{D}_{k}Q_{k+2}$	$\displaystyle=(A_{k+2}-B_{k+2})(2Q_{2}+Q_{k})$
(89b)		$\displaystyle=(Q_{2}-B_{k+1})(Q_{2}+A_{k+1})$
(89c)		$\displaystyle=Q_{2}^{2}+Q_{2}(A_{k+1}-B_{k+1})-A_{k+1}B_{k+1}$
(89d)		$\displaystyle=2Q_{2}^{2}-Q_{2}B_{k}-Q_{k+1}.$

Here, we have used (83) with $i=k$ , $j=2$ and $j=i$ , in the first equality, and the fact that for the specific structure given in (57), we have $A_{k+2}-B_{k+2}=Q_{2}-B_{k+1}$ .

To prove (85), we use (89) with $k=m$ :

(90)

\left(\mathcal{D}_{m}+\frac{\partial}{\partial c_{m+1}}\right)p=2Q_{2}^{2}-Q_{2}B_{m}-Q_{m+1}+Q_{m+1}=2Q_{2}^{2}-Q_{2}B_{m}.

To prove (84), we use that $\mathcal{D}_{i-1}Q_{2}=\cdots=\mathcal{D}_{i-1}Q_{i}=0$ since these elements are independent of row $i-1$ . This, together with the chain rule, yields:


(91a)	$\displaystyle\mathcal{D}_{i-1}Q_{i+2}$	$\displaystyle=B_{i+2}\mathcal{D}_{i-1}A_{i+2}+A_{i+2}\mathcal{D}_{i-1}B_{i+2}$
(91b)		$\displaystyle=(A_{i+2}+B_{i+2})\mathcal{D}_{i-1}Q_{i+1}$
(91c)		$\displaystyle=(A_{i+2}-B_{i+2})\mathcal{D}_{i-1}Q_{i+1}+2B_{i+2}\mathcal{D}_{i-1}Q_{i+1}.$

The final equality is a reformulation which simplifies the following derivation,


(92a)	$\displaystyle\left(\mathcal{D}_{i-1}+2\frac{\partial}{\partial a_{i,i}}\right)Q_{i+2}$	$\displaystyle=(A_{i+2}-B_{i+2})\mathcal{D}_{i-1}Q_{i+1}+2B_{i+2}\left(\mathcal{D}_{i-1}Q_{i+1}+Q_{i}\right)$
(92b)		$\displaystyle=(A_{i+2}-B_{i+2})(2Q_{2}^{2}-Q_{2}B_{i-1}-Q_{i})+2B_{i+2}\left(2Q_{2}^{2}+Q_{2}B_{i-1}\right),$

where we have used (82) in the first equality. The degree of the first term in the right hand side is given by

(93)

\displaystyle\deg((A_{i+2}-B_{i+2})(2Q_{2}^{2}-Q_{2}B_{i-1}-Q_{i}))

\displaystyle=\deg(Q_{i}Q_{i})=2^{i-1},

and the degree of the second term is given by

(94)

\displaystyle\deg(B_{i+2}B_{i-1}Q_{2})=2^{i-1}+2^{i-4}+1.

The theorem conclusion (84) follows from Equation (92) and Equation (94).

∎

Proof of Theorem 10

We want to prove the equality in (68); but the upper bound is already given in Corollary 9. To prove that $m^{2}$ is also a lower bound it is sufficient to find one point, i.e., one triplet, with a Jacobian of rank $m^{2}$ . This is done by finding particular linear combinations of partial derivatives of the entries in $(A,B,c)$ with $m^{2}$ distinct degrees. We assume the structure given in (57). We construct linear combinations of partial derivatives in 5 different ways. We refer to these linear combinations as Cases 1, 2, 3, 4 and 5.

Case 1: The fact that the evaluation scheme is unreduced means $\deg(Q_{1})=1$ , and $\deg(Q_{j})=2^{j-2}$ for $j=2,\ldots,m+2$ . This, together with $\deg\left(\frac{\partial p}{\partial c_{j}}\right)=\deg(Q_{j})$ , directly implies (58).

For Case 2, we consider the operator

\mathcal{D}_{i}:=\frac{\partial}{\partial a_{i,j}},

for $i=2,\ldots,m$ and $j=2,\ldots,i$ . Since $a_{i,j}$ does not appear in the first $i-1$ rows, we have $\mathcal{D}_{i}Q_{2}=\cdots=\mathcal{D}_{i}Q_{i+1}=0$ . Consequently, the conditions of Lemma 15 are satisfied. Moreover, (82) implies that

(95)

\deg(\mathcal{D}_{i}Q_{i+2})=\deg(Q_{j}B_{i+2})=\deg(Q_{j})+\deg(Q_{i+1})=2^{i-1}+2^{j-2}.

Combining this with (78) yields


(96a)	$\displaystyle\deg(\mathcal{D}_{i}p)$	$\displaystyle=\deg(\mathcal{D}_{i}Q_{m+2})=2^{m-1}+\cdots+2^{i}+\deg(\mathcal{D}_{i}Q_{i+2})$
(96b)		$\displaystyle=2^{m-1}+\cdots+2^{i-1}+2^{j-2}.$

We conclude (61).

For Case 3, we consider the operator

\mathcal{D}_{i}:=\left(\frac{\partial}{\partial b_{i,j}}-\frac{\partial}{\partial b_{i,j}}\right),

for $i=3,\ldots,m$ and $j=2,\ldots,i-1$ . The conditions of Lemma 15 are satisfied by the same reasoning as in Case 2. From (83) we obtain

(97)

\deg(\mathcal{D}_{i}Q_{i+2})=\deg(Q_{j}(A_{i+2}-B_{i+2}))=\deg(-Q_{j}Q_{i})=2^{i-2}+2^{j-2}.

This, together with (78), implies (64).

For Case 4, we consider the operator

\mathcal{D}_{m}:=\left(\frac{\partial}{\partial b_{m,m}}-\frac{\partial}{\partial a_{m,m}}+2\left(\frac{\partial}{\partial b_{m,2}}-\frac{\partial}{\partial a_{m,2}}\right)+\frac{\partial}{\partial c_{m+1}}\right).

From equation (84) we immediately conclude that

(98)

\displaystyle\deg(\mathcal{D}_{m}p)=\deg(\mathcal{D}_{m}Q_{m+2})=\deg(Q_{2}B_{m})=2^{m-3}+1.

For Case 5, we consider the operator

\mathcal{D}_{i}:=\left(\frac{\partial}{\partial b_{i-1,i-1}}-\frac{\partial}{\partial a_{i-1,i-1}}+2\left(\frac{\partial}{\partial b_{i-1,2}}-\frac{\partial}{\partial a_{i-1,2}}\right)+2\frac{\partial}{\partial a_{i,i}}\right),

for $i=5,\ldots,m$ . We show that this operator satisfies the assumptions of Lemma 15. Firstly, we observe that $\mathcal{D}_{i}Q_{2}=\ldots=\mathcal{D}_{i}Q_{i}=0$ by the same reasoning as in Case 2. Therefore, it is enough to show that $\deg(\mathcal{D}_{i}Q_{i+1})<\deg(\mathcal{D}_{i}Q_{i+2})$ . From (84) and (94) we have that $\deg(\mathcal{D}_{i}Q_{i+2})=2^{i-1}+2^{i-4}+1$ . Next, we observe that


(99a)	$\displaystyle\deg\left(\mathcal{D}_{i}Q_{i+1}\right)$	$\displaystyle=\deg\left(\left(\frac{\partial}{\partial b_{i-1,i-1}}-\frac{\partial}{\partial a_{i-1,i-1}}+2\left(\frac{\partial}{\partial b_{i-1,2}}-\frac{\partial}{\partial a_{i-1,2}}\right)+2\frac{\partial}{\partial a_{i,i}}\right)Q_{i+1}\right)$
(99b)		$\displaystyle=\deg\left((A_{i+1}-B_{i+1})(Q_{2}+Q_{i-1})\right)$
(99c)		$\displaystyle=\deg(Q_{i-1}^{2})=2^{i-1}<2^{i-1}+2^{i-4}+1.$

Thus the assumptions of Lemma 15 are satisfied. It follows that


(100a)	$\displaystyle\mathcal{D}_{i}p=\mathcal{D}_{i}Q_{i+2}$	$\displaystyle=2^{m-1}+\cdots+2^{i-1}+2^{i-4}+1,\quad i=5,\ldots,m,$
(100b)		$\displaystyle=2^{m-1}+\cdots+2^{i}+2^{i-3}+1,\quad i=4,\ldots,m-1.$

To summarize, we have obtained the following degrees through linear combinations of partial derivatives.

•

Case 1: $0,2^{0},2^{1},\ldots,2^{m}$ .
•

Case 2: $2^{m-1}+\cdots+2^{i}+2^{i-1}+2^{j-2},\;\;\textrm{ for }j=2,\ldots,i,\;\;i=2,\ldots,m$ .
•

Case 3: $2^{m-1}+\cdots+2^{i}+2^{i-2}+2^{j-2},\;\;\;\;\;\textrm{ for }j=2,\ldots,i-1,\;\;i=3,\ldots,m$ .
•

Case 4: $2^{m-3}+1$ .
•

Case 5: $2^{m-1}+\cdots+2^{i}+2^{i-3}+1,\;\;\;\textrm{ for }i=4,\ldots,m-1.$

These form distinct degrees, in the following way. Firstly, we note that when the degrees are expressed as binary numbers, Case 1 and Case 4 always involve one and two nonzeros respectively, making these degrees distinct. Similarly, when we express the degrees of Case 2, 3 and 5 as binary numbers, they always involve three or more nonzeros, making them distinct from Case 1 and Case 4. For Case 2, Case 3 or Case 5 to coincide, the degrees as binary numbers must have the same number on nonzeros. This corresponds to choosing the same $i$ in each of the formulas, which leads to different degrees. Therefore we conclude that all the above degrees are distinct from each other.

Counting the number of unique degrees, we get: Case 1: $m+1$ ; Case 2: $m(m-1)/2$ ; Case 3: $(m-1)(m-2)/2$ ; Case 4: $1$ ; Case 5: $m-4$ . In total, we have $m^{2}$ distinct degrees for $m\geq 4$ .

The polynomial set associated with a fixed number of matrix-matrix multiplications

Abstract

1 Introduction

2 Evaluation scheme

2.1 Definition of the evaluation scheme

Examples of instances of Π2m∗\Pi_{2^{m}}^{*}

2.2 The semi-algebraic set perspective of Π2m∗\Pi_{2^{m}}^{*}

3 Equivalence transformations

3.1 Substitution transformations

Theorem 1.

Proof.

Theorem 2.

Proof.

3.2 Normalized forms and unreduced schemes

Definition 3.

Corollary 4.

Definition 5.

Corollary 6.

3.3 Further transformations

Theorem 7.

Proof.

Theorem 8.

Proof.

Free variables in Theorem 8

4 Minimality of parameterization

4.1 Minimality of unreduced evaluation schemes

Corollary 9.

Theorem 10.

Proof.

4.2 Reduced evaluation schemes

5 Polynomial subsets

5.1 Four multiplications

5.2 Five multiplications

Conjecture 11.

5.3 Six multiplications

Conjecture 12 (Six multiplications).

5.4 Seven multiplications

Conjecture 13 (Seven multiplications).

Remark 14 (Computational example).

6 Conclusions

Acknowledgements

References

Appendix A Auxiliary material for the proof of Theorem 10

Lemma 15.

Proof.

Lemma 16.

Proof.

Proof of Theorem 10

Examples of instances of $\Pi_{2^{m}}^{*}$

2.2 The semi-algebraic set perspective of $\Pi_{2^{m}}^{*}$