0% found this document useful (0 votes)
49 views11 pages

Scaled Fenwick Trees

This article presents a new data structure called Scaled Fenwick Trees, which allows for efficient storage and retrieval of linear array numeric data with logarithmic time complexity for updates, range sums, and rescaling operations. The structure combines two Fenwick tree-like components to facilitate range multiplication by scalars while maintaining performance for other operations. Experimental results indicate significant performance improvements, making this data structure particularly useful in applications like decentralized finance protocols.

Uploaded by

rehimovemin011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views11 pages

Scaled Fenwick Trees

This article presents a new data structure called Scaled Fenwick Trees, which allows for efficient storage and retrieval of linear array numeric data with logarithmic time complexity for updates, range sums, and rescaling operations. The structure combines two Fenwick tree-like components to facilitate range multiplication by scalars while maintaining performance for other operations. Experimental results indicate significant performance improvements, making this data structure particularly useful in applications like decentralized finance protocols.

Uploaded by

rehimovemin011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/[Link]

Scaled Fenwick Trees


MATTHEW CUSHMAN1
1
Ajna Labs (e-mail: matt@[Link], mcushman@[Link])

ABSTRACT A novel data structure that enables the storage and retrieval of linear array numeric
data with logarithmic time complexity updates, range sums, and rescaling is introduced and studied.
Computing sums of ranges of arrays of numbers is a common computational problem encountered in
data compression, coding, machine learning, computational vision, and finance, among other fields.
Efficient data structures enabling log n updates of the underlying data (including range updates),
queries of sums over ranges, and searches for ranges with a given sum have been extensively
studied (n being the length of the array). Two solutions to this problem are well-known: Fenwick
trees (also known as Binary Indexed Trees) and Segment Trees. The new data structure extends the
capabilities for the first time to further enable multiplying (rescaling) ranges of the underlying data
by a scalar as well in log n. Scaling by 0 can be enabled, with the effect that subsequent updates
may take (log n)2 time. The new data structure introduced here consists of a pair of interacting
Fenwick tree-like structures, one of which holds the unscaled values and the other of which holds
the scalars. Experimental results demonstrating performance improvements for the multiplication
operation on arrays from a few dozen to over 30 million data points are discussed. This research was
done as part of Ajna Labs in the course of developing a decentralized finance protocol. It enables
an efficient on-chain encoding and processing of an order book-like data structure used to manage
lending, interest, and collateral.

INDEX TERMS Cumulative Sums, Fenwick Trees, Partial Sums, Prefix Sums, Segment Trees

I. INTRODUCTION disadvantages of three implementations for storage and


Consider the problem of storing arrays of numbers so retrieval of arrays: the naive approach (storing in a linear
that three distinct operations are efficient indexed array), Fenwick Trees, and the new method
1) Incrementing individual values introduced here, Scaled Fenwick Trees. Each method
2) Calculating cumulative sums over ranges of indices has its strengths and weaknesses, so the best choice
3) Rescaling values over ranges of indices for a given application depends on the frequency and
circumstances with which one needs to perform each
Solutions that implement the first two operations in operation. The constants for the Scaled Fenwick Tree
log n time where n is the length of the list are known. are somewhat worse than those for the Fenwick Tree
Two of the most commonly used algorithms are Fenwick for the simple query/range sum operations and updates.
Trees and Segment Trees. This paper introduces a novel However, the base Fenwick Tree requires super-linear
extension of the Fenwick tree that supports the rescaling time for range rescaling while the SFT is the only
operation as well in log n time, called a “Scaled Fenwick method to offer logarithmic time complexity in that case.
Tree” (SFT). The idea behind the SFT is to store the Section VI gives experimental data that agrees with the
underlying values in a traditional Fenwick tree and to theoretical complexities in Table 1.
encode the scalars in a similar, parallel, Fenwick tree-like
data structure, with multiplication instead of addition Fenwick and Ryabko independently discovered what
being the binary operation. The latter Fenwick-like tree are now known as Fenwick Trees (or, alternatively,
encodes the scalar multiples that have been applied Binary Indexed Trees) in [1] and [2] (also see [3]). For
to ranges of the data itself. The two trees interact so both of these initial papers, the motivation came from
that incrementing or rescaling a particular value involves dynamic data compression, in which the underlying data
traversing both trees in an interactive manner. were frequency tables of some tokens in a stream of data.
Table 1 summarizes the comparative advantages and These problems had been studied extensively in the

VOLUME 10, 2022 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

Naive Fenwick Tree Scaled Fenwick


Query O(1) O(log n) O(log n)
j > i.
Range Sum O(1) O(log n) O(log n) In short, there is a distinction between the underlying
Update O(n) O(log n) O(log n) encoded numerical sequence ai and the actual represen-
Range Multiply O(n) O(n log n) O(log n) tation in memory as a data structure. There are many
Space n n 2n
different ways to encode the sequence ai , with different
TABLE 1. Comparitive Time/Space Complexity time and space trade-offs. In particular, if updating
values is rare, but partial sums are often required,
representation C[i] above might be preferable to B[i].
There are data structures that can achieve logarithmic
parallel algorithms community [4]. Recently, Bille et al. time complexity for both updates and prefix sums, with
in [5] consider a data structure storing dynamic partial the trade-off being that simple queries of a particular
sums that enables merging adjacent array entries. The value themselves also become logarithmic time. One of
theory behind data structures enabling efficient partial the most well-studied and efficient was independently
summation has been studied as well, with Pătraşcu and discovered by Boris Ryabko and Peter Fenwick ( [1] [2]),
Demaine providing tight bounds in [6], and a thorough and is now known as a Fenwick Tree or Binary Index
study of the general structure of storing partial sums Tree (BIT). The idea behind a Fenwick tree is to find
by Chaudhuri and Hagerup in [7]. A detailed study of a happy medium between the raw representation B[i]
the practical implementation of solutions to the range and the pure prefix sum representation C[i] discussed
sum/prefix sum problem, including Fenwick trees and above. If, instead, certain well-chosen sums of ranges of
various flavors of Segment trees, is found in [8]. values in the array are stored, then to update or query
Subsequently, Fenwick trees have found many ap- a particular index, one only need to access O(log n)
plications due to their simplicity and efficiency. They elements of the array. This enables updating, querying,
are somewhat less general in capability compared to and computation of partial sums all in logarithmic time.
Segment Trees but are more space efficient. Segment There are some other operations to consider on our
Trees are redundant, requiring double space of a naive array that are also enabled by Fenwick trees. Searching
array or Fenwick tree. Furthermore, because Fenwick the array for a particular prefix P sum (i.e., finding the
i
trees rely on standard bit operations for indices in largest index i whose prefix sum j=1 ai is less than a
twos-complement format, they are easy and efficient to given value) is important for certain applications. This
implement on a wide range of architectures. They have can also be done in logarithmic time using a Fenwick
found applications in computer vision and graphics (see tree. One might also want to consider range updates:
[9] [10]), statistical regression and kernel estimation (see incrementing some prefix of the tree by a value d,
[11]) effectively replacing aj by aj + d for all j ≤ i. This can
Let ai for i = 1 . . . n be a sequence of numbers that also be done, essentially by modeling the prefix sums as
are to be stored in memory. The most natural way piecewise linear functions of the index, and storing the
to represent this sequence is to store the raw values constants and coefficients in separate Fenwick trees (see
sequentially in memory as an array B[i] for i = 1 . . . n, [12]).
so that B[i] stores the value ai . This representation This paper focuses on the problem of scaling the array
has the merit that changing a value ai for a particular values by a given scalar value x. The best-known method
index i is a constant time operation, as is querying for scaling an entire Fenwick tree is to iterate through
the data structure to determine the value of ai (we all of the values in the tree and scale each individually.
are using an idealized model of computation in which This linear time operation is actually faster than the
random memory access is constant time). It suffers two most straightforward method of using the Fenwick tree
drawbacks in addressing problems (1)-(3) however: both update method to update each value, which is O(n log n)
(2) and (3) require O(n) time. To compute the sum of (since updates themselves are O(log n)).
the first k elements, one would need to iterate over all Here, a new data structure that enables logarithmic
indices up to k, accumulating the sum along the way. scaling of any initial segment of the tree by nonzero
A similar process of iterating over the entire array is scalars while preserving logarithmic updates, prefix
necessary to scale the first k elements. sums, and searches is introduced. The idea behind the
One could instead storePi the values ai as partial sums, algorithm is to maintain two interacting Fenwick tree-
setting C[i] to be j=1 ai . In this representation, like data structures, one of which (the “values array”)
querying a particular value in the array is also constant stores the unscaled values themselves and the other
time (ai can be reconstructed as the difference in consec- of which (the “scaling array”) stores the scale factors.
utive values of the array C), and computing a prefix sum The usual invariant of the Fenwick tree, that values
becomes trivial. However, updating a particular entry ai in the array store sums of the raw sequence values
becomes expensive: not only must the prefix sum C[i] over particular ranges of indices, is replaced with a
be updated, but also all subsequent values C[j] for new invariant: each entry in the values array times the
2 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

product of entries encountered in the scale factor array ρi be its rightmost (least significant) 1. For example, 44
as you traverse from the node towards the root in a is 101100 in binary, so λ44 = 5 and ρ44 = 2.
particular manner, is equal to the sum of scaled values Let a sequence ai , which is the underlying data to be
in a particular range. Scaling by zero can be enabled stored, encoded as a Fenwick tree V[i]. The principle
as well, with a hit to the update operation becoming of the Fenwick tree is to store at index i in the array
O((log n)2 ). the sum of the values from the index j with the least
This research was done as part of Ajna Labs in the significant bit of i cleared, up to index. Define FR(i)
course of developing a decentralized finance protocol. (the Fenwick Range of i) to be the these indices:
In this protocol, lenders deposit tokens in an order-
book like structure indexed by price. Computing the
FR(i) = i − 2ρi + 1, i − 2ρi + 2, . . . , i − 1, i

amount of deposit above a given price, or finding the (1)
price above which a given amount of deposit sits, are
both key problems. Furthermore, deposits earn interest, V[i] = ai−2ρi +1 + ai−2ρi +2 + · · · + ai (2)
but only if they are priced above a certain level, which
was the motivation for the rescaling operation. For example, if i is odd, FR(i) = {i} and V[i] = ai .
If i is a power of 2, then FR(i) = {1, 2, ..., i} and V[i]
II. SYMBOLS AND ABBREVIATIONS is the entire prefix sum up to and including ai . If i ≡ 2
For convenience, Table 2 contains a reference list of (mod 4) then FR(i) = {i − 1, i} and V[i] = ai−1 + ai .
commonly used symbols and abbreviations in the text. The following facts are easily verified and are the key
In all cases, they are also defined or described when observations explaining how Fenwick trees work:
introduced.
FT.1 i ∈ FR(i)
TABLE 2. Table of Terms and Definitions FT.2 For all i, j, i 6= j implies FR(i) 6= FR(j). Also,
either FR(i) ⊂ FR(j), or FR(j) ⊂ FR(i), or
Term Definition
n Array length
FR(i) ∩ FR(j) = ∅
a Sequence of underlying data to be stored, processed FT.3 j ∈ FR(i) if and only if i can be obtained from j by
and queried iterating the update function upd(j) := j + 2ρi .
λi The place of the leftmost nonzero bit of integer i
ρi The place of the rightmost nonzero bit of integer i
FT.4 Let the interrogation function int(j) be the integer
FR[i] The range of values included in the sum obtained by clearing the least significant bit of j’s
stored in index i of a Fenwick tree. binary expansion: int(j) := j − 2ρj . The set of
i − 2ρi + 1, i − 2ρi + 2, . . . , i − 1, i

V[i] The values array of a Scaled Fenwick Tree
positive integers up to and including i is partitioned
S[i] The scaling array of a Scaled Fenwick Tree into the sets FR(j) where j is obtained by iterating
upd(j) The next index to visit when updating a classic int starting at i and ceasing once obtaining 0.
Fenwick Tree; upd(j) = j + 2ρj
int(j) The next index to visit when querying a classic The functions upd and int were introduced in [13] to
Fenwick Tree; int(j) = j − 2ρj streamline the discussion of the procedures to update
Upd(j) The set of iterates of upd applied to j
scale(i) The total scaling applied to the and interrogate Fenwick trees. In order to increment
Qvalue at index i of a
Scaled Fenwick Tree, equal to j∈Upd(i) S[i] an underlying value ai stored in a Fenwick tree (an
“update” call) while preserving the invariant 2, one
can use property FT.3. Increment the value stored in
III. REVIEW OF FENWICK TREES location j of the Fenwick tree itself, V[i] for j being
The following discussion is influenced by Section 4 of any iterate of the update function upd starting at i. Let
[13], which has a detailed discussion of the arithmetic Upd(i) be the set of these indices obtained by iterating
relationships between indices that form the basis for upd on i. There are only at most log n such numbers
Fenwick trees. As Marchini and Vigna discuss, the term less than n, hence the iteration finished in logarithmic
Fenwick “tree” is a misnomer, as there is no single time. Figure 1 illustrates an example of this. The indices
tree-like structure relating the indices to one another. are listed in the bottom row of boxes, and the raw
Instead, there are three distinct iteration patterns that underlying data ai in the row of boxes above that. The
are used to increment, query, and search through a Fenwick tree data itself is sorted above, with single solid
Fenwick tree. Below is an overview of Fenwick trees to arrows showing the upd function and dashed arrows the
fix the notation. int function. The double solid arrows show the path
All arrays and sequences begin with index 1. This is that the update algorithm would traverse in order to
standard in the Fenwick tree literature, as the index increment the value stored at index 5.
calculations become simpler to express in standard bit Similarly, using FT.4, one obtains the prefix sum of
arithmetic. the underlying array by summing V[j] for j being any
For an integer i, define λi be the place of the leftmost iterate of int applied to i. There are at most log i such
(most significant) 1 in the binary expansion of i, and let nonzero iterates. An example is given in Figure 2.
VOLUME 10, 2022 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

FIGURE 1. Update item 5 by adding 1

FIGURE 2. Interrogate sum items 1 to 15

IV. SCALING FENWICK TREES: NONZERO SCALARS Since the query and update operation is O(log n) and
We now move on to the main result of this paper: they would need to do this n times, the entire process
enabling efficient rescaling of ranges of the underlying would be O(n log n). An obvious improvement would
data as well. For example, suppose the elements in be to just iterate directly on the underlying tree itself,
the underlying array ai correspond to some statistical scaling each element V[i] by the factor f, in O(n) time.
observations that fall in particular buckets indexed by Starting with any data structure for representing
i. In order to translate the observations in a probability arrays ai that supports range sums and updates, one can
distribution, one would need to rescale the array by the augment the data structure with a global scalar s, which
sum of the entire array to ensure that the sum of values is interpreted as “all elements of array ai are scaled by s”.
is 1. This would enable global rescaling even more simply and
The most naive algorithm to do this for data repre- efficiently in O(1) time. To compute the sum over any
sented in a Fenwick tree would be of order n log n as desired range, one simply scales the sum by s (relying
follows. Let f be the factor by which the user wants to on the distributive property). To increment or update ai
rescale every element ai . They could iterate through the by a value z, call the increment/update function on the
array, adding (f − 1) · ai to the element ai for i = 1 · · · n. underlying data structure with s−1 · z, which works fine
4 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

as long as s is nonzero. scaled. As the algorithm proceeds up from 2k + 2l


One can generalize this even further by storing the towards the root, it first passes through the nodes
scaling factors themselves in a Fenwick tree-like data 2k + 2j+1 , 2k + 2j+1 up to 2k + 2k−1 .
structure in parallel to the data that stores the par- Let the arrays V[i] and S[i] satisfying invariant (4)
tial sums. Augment the Fenwick tree data V[i] with be given. One can then construct an entire prefix sum
scaling factors S[i] for i = 1 . . . n. Intuitively, the to i in a manner similar to the standard Fenwick tree
value in S[i] is regarded as a scaling factor that has prefix sum algorithm. As in a standard Fenwick tree,
been applied to all members of the underlying data the entire prefix sum is broken into sums over at most
ai for i ∈PFR(j). The Fenwick tree invariant (2) that log n subintervals. The new wrinkle is that each of these
V[i] = j∈FR(i) aj is replaced with a more complex subinterval sums must be multiplied by the appropriate
invariant involving the scaling factors. Define the scale scale factor as in (4). These scale factors are stored in
factor of element i as the product of the factors stored in Fenwick tree structure S[i] that parallels the structure
S[j], as j traverses the Fenwick tree from i to 0 along of the partial sums V[i], and so they themselves can
the same paths used to update the tree: be accumulated (multiplicatively) alongside the partial
scale(i) =
Y
S[j] (3) sums. Because the factors increase as one moves down
the tree, it’s more efficient to write this algorithm as
j∈upd(i)
moving from the root of the tree down towards the
We then maintain the invariant: leaves rather than the usual Fenwick tree prefix sum
implementation, which iterates from the deeper nodes
X
S[i] * V[i] = aj (4)
j∈FR(i)
towards the root.
Below is Python-like pseudo-code implementing the
The V[i] array and S[i] array act much like Fen- prefix sum algorithm, given tree data V[i] and S[i],
wick trees, but with additional interwoven structure. and index index. The variable i traverses the tree
The V[i] are partial sums of the underlying aj data up downwards from the root towards the target index for
to a scalar factor. The S[i] encode the scaling factors the prefix sum. This is accomplished by reconstructing
themselves. The quantity by which one should scale index bit by bit, starting with the most significant
V[i] is obtained by starting with S[i] and iterating bit, in contrast with the usual Fenwick algorithm, which
up the tree to the root node, accumulating the scaling starts with index and clears it bit by bit, starting with
factors multiplicatively. the least significant bit. The new algorithm increments
Figure 3 shows a representation of a scaled Fenwick the sum at the same indices as in the standard Fenwick
tree analogous to the figures presented earlier for stan- tree prefix sum algorithm, but in reverse order. It also
dard Fenwick trees. The scaling factors are listed below needs to visit some intermediate nodes, however, to track
the values and are initially all set to 1. the scale factor itself. For example, in order to compute
Examples: the prefix sum stored at index 5, the algorithm needs to
1) Suppose n = 2m for some m, and the user wants consider not just the values and scales stored at indices
to compute the prefix sum of values up to 2k for 4 and 5 as in a standard Fenwick tree, but also the scale
k ≤ m. In a standard Fenwick tree, the value factor stored at 6 (as well as 8, and any higher power of
stored in V[2ˆk] is the entire prefix sum of the 2).
underlying data values up to and including 2k : 1 def prefixSum(values[], scales[], index):
V[2ˆk] = a1 + a2 + · · · + a2k , so the value is simply 2 runningSum=0
V[2ˆk]. In a scaling Fenwick tree, this value is 3 scale=1
scaled by the product of the scale factors stored 4 j=1 << maxNumberOfBitsInIndex
5 i=0
in the S array as traverses along the path up the 6 while j>0:
tree from node to the root, along the increment 7 if index&j:
paths from 2k to 2m . These are the powers of 2 8 runningSum+=scale*scales[i+j]
9 *values[i+j]
between those two indices, so the return value is 10 else:
S[2ˆk]*S[2ˆ(k+1)]*...*S[2ˆm]*V[2ˆk]. 11 scale *= scales[i+j]
2) Again, let n = 2m , and suppose that the users 12 i=i+(index&j)
13 j= j>>1
wants to compute the prefix sum of values up to 14 return runningSum
2k + 2j for some j < k ≤ m. In a standard Listing 1. Prefix Sum
Fenwick tree the value stored in V[2ˆk+2ˆj] is
the sum of values between 2k + 1 and 2k + 2j : In order to increment a specific value in the scaled
V[2ˆk] = a2k +1 +a2k +2 +· · ·+a2k +2j . One can add Fenwick tree at index index, one needs only to update
this to V[2ˆk] to reconstruct the entire prefix sum, the values in Upd(index). In the standard Fenwick
so the value returned is V[2ˆk]+V[2ˆk+2ˆj]. tree, this is done by traversing the tree upwards using
In a scaling Fenwick tree, these values need to be upd starting at index, but in this case, how much to
VOLUME 10, 2022 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

FIGURE 3. Scaled Fenwick Tree

increment the value V[i] by is unknown, because it’s factor stored in entries 4 and 5 alone is insufficient:
been scaled by scale(index) which is the product of all subsequent queries for the sum up to 6 would still reflect
the S[j] as j traverses the path from index to the root the unscaled value at index 5, as this is stored as part
along the update path. One could compute this explicitly of the sum in index 6. The correct algorithm needs to
at the outset, but this would require a redundant traver- adjust the values stored in these overlapping indices as
sal. Instead, reversing direction and traversing down- well.
wards from the root to index avoids this redundancy. This can be done by not only traversing upwards
Accumulate the scale factors in runningScale along through the indices by flipping successive least signif-
the traversal. Because the value v is added to the value icant bits to 0 in the binary expansion of index as
at index i, which is included in the partial sum scaled done in increment, but by also including intermediate
by runningScale at location ii + j in the code below, indices that have a single 0 flipped to a 1. The code
when incrementing the value array, it is necessary to mult below does this by starting with j as the least
divide by runningScale first. significant bit of index and iteratively shifting it left.
1 def increment(values[], scales[], index, v): The variable runningSum stores the total increase in
2 j=1 << maxNumberOfBitsInIndex sum below index in the tree at each loop. If index has
3 ii=0 the same bit set to 1, execute the “if” part (lines 6-8),
4 runningScale=1
5 while j>0: which scale the subtree below index and accumulates
6 if (index-1)&j: in runningSum how much they were incremented. Also
7 ii+=j flip the bit of index to 0 as in increment. If the
8 else:
9 runningScale *= scales[ii+j] corresponding bit in index is set to 0, the index is
10 values[ii+j]+=v/runningScale in the overlapping interval case similar to index 6 in
11 j = j >> 1 the example above. Then increment the correspond-
12 return (values, scales)
ing value array element by runningSum, and update
Listing 2. Increment
runningSum itself by the corresponding scale factor so
Now consider the algorithm to scale a prefix range that it remains accurate further up the tree.
of values itself. To scale every entry up to index by Figure 4 shows an example of this operating on the
a number factor, one could partition {1, . . . , index} SFT presented earlier. The red boxes and blue boxes
into subranges as in FT.4. Each one of these subranges are the nodes visited when multiplying the 9th entry by
can be implicitly scaled by applying the scaling factor 3. Red boxes correspond to the “if” clause, while blue
to the appropriate entry in the scaling array S[j]. This boxes correspond to the “else” clause.
works well for maintaining invariant (4) for index itself, 1 def mult(values,scales, i, factor):
but alone would cause the resulting data to violate the 2 runningSum=0
same invariant (4) for other indices that overlap but 3 j=i&(-i)
4 while j<=maxIndex:
aren’t contained in 1 . . . index. For example, rescaling 5 if(i&j):
the values up to index 5 by only changing the scale 6 runningSum+=(factor-1)*scales[i]

6 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

7 *values[i] the V[i] array below it in the tree. In order to allow


8 scales[i]*=factor a particular element to be incremented below that, one
9 i-=j
10 else: could reset any 0’s encountered in the scaling tree to 1,
11 values[i+j]+=runningSum and then set its value array entry to 0, as well as the scale
12 runningSum*=scales[i+j] factor of all of its children to 0. This is simply another
13 j = j << 1
way of encoding the data “all entries below this index
Listing 3. Multiply are 0”. This reset will allow the division to proceed.
Below is the pseudo-code for the inverse prefix sum In order for this to work, O(log n) entries in the scaling
function as well. This searches the tree for the least array need to be set to 0. As the process traverses
index whose sum up to and including it doesn’t exceed down the tree, it will continue to encounter these 0,
target. This operates very similarly to the analogous then setting their children to 0 as well. The resulting
search function for standard Fenwick trees, with the algorithm has time complexity O(log2 n).
addition that it accounts for the scale factor along the 1 def incrementAllowingZeros(values, scales, index,
way. x):
2 j=1 << maxNumberOfBitsInIndex
1 def invesePrefixSum(values, scales, target): 3 ii=0
2 i = 1 << maxNumberOfBitsInIndex 4 runningScale=1
3 runningSum = 0 5 while j>0:
4 runningScale = 1 6 if (index-1)&j:
5 runningIndex = 0 7 ii+=j
6 while i > 0: 8 else:
7 if runningSum 9 if scales[ii+j]==0:
8 +runningScale*values[runningIndex+i] 10 scales[ii + j]=1
9 *scales[runningIndex+i] 11 values[ii + j]=0
10 < target: 12 k=j-1
11 runningIndex+=i 13 while k>0:
12 runningSum+=runningScale 14 scales[ii+k]=0
13 *values[runningIndex] 15 k-= k&(-k)
14 *scales[runningIndex] 16 runningScale*=scales[ii+j]
15 else: 17 values[ii+j]+=x/runningScale
16 runningScale*=scales[runningIndex+i] 18 j = j >> 1
17 i=i>>1 19 return (values, scales)
18 return runningIndex
Listing 5. Increment Allowing Zeroes
Listing 4. Inverse Query

It is possible to modify these algorithms (at least in


V. SCALING FENWICK TREES: ALLOWING FOR ZEROS the nonzero scalar case) to enable the preservation of
The previous section describes a system for scaling zeros. Add a new function, obliterate, that can force
ranges of an array of numbers by a nonzero scalar. What a value to be 0 “on the nose”. This addresses the issues
happens to this algorithm if passed zero into the mult with the first example above. One can further modify
function? This will traverse through certain nodes in the the other functions that change state, increment and
tree, multiplying the entries in the scaling array to 0 – mult, so that they preserve zero values. The following
which, of course, merely sets them to the value 0. This is a useful alternative characterization of a given index’s
implicitly encodes the invariant that effectively says “all value “being zero” in an SFT: compute the value at
values below this in the tree are 0”. The only mechanism index i by taking the difference between two adjacent
in the above algorithms to modify the scaling array prefix sums. These sums are computed by summing
further is an additional call to mult, which can only the appropriate intervals given in FT.2, scaled by the
multiply the entries of the scaling array by subsequent appropriate factors as encoded in the tree. For two
scaling values. Since there is nothing one can multiply adjacent indices i − 1 and i, many of these intervals (and
0 by to get a nonzero value, there is no possible way to the associated scale factors) will coincide. The difference
increment an entry to a nonzero value once it has been comes in index i itself: the prefix sum up to indices
set to zero in this way. Another way to see this problem including i will include as a term the sum over FR(i)
is on line 10 of the pseudo-code for increment, which (computed as scale(i)*V[i], while that for i − 1
divides by the value in the scaling array. If this value has will include the sum of over FR(i − j) where j is a
been set zero, the algorithm will fail with a division by power of 2 less than 2ρi (each summand computed as
0. scale(j)*V[j]). The scale factors scale(j) that apply
The issue is this division in increment. The existing to all of the intervals in the latter sum are products of
algorithm does work to scale ranges by zero but breaks the scale factors encountered as one traverses from index
subsequent calls to increment values in that range. In the j to the root. These are precisely S[j] times the same
increment code, the variable runningScale tracks set of scale factors that appear in the product expansion
the implicit scale factor that is applied to all entries in of scale(i). Therefore, an alternative characterization of
VOLUME 10, 2022 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

FIGURE 4. Scaling Fenwick Tree Index 9 by 3

an index i having value 0 in the SFT is: 6 j=j<<1


X 7 while i<=maxIndex:
V[i] = S[i-2ˆa]*V[i-2ˆa] (5) 8 newValue=values[i]+runningSum
9 runningSum=newValue*scales[i]
a<ρi
10 -values[i]*scales[i]
11 values[i]=newValue
By forcing this equality, one can set the value at index i 12 i+=i&(-i)
to 0. Furthermore, if one can always update the tree in
Listing 6. Prefix Sum
such a way that this equality continues to hold after the
update if it did prior, then zero values across updates It would be tempting in line 9 of obliterate
will be preserved across updates. to simply set runningSum = runningSum*S[i] –
Equation 5 dictates how to write obliterate: just after all, distributing S[i] over V[i]+runningSum
compute the right-hand side, and put it V[i]. However, makes this look obvious. However, this would violate the
there is one subtlety: after zeroing out the target index, rounding criteria, as the precise change in V[i] must be
those changes must propagate up the tree to the other preserved to propagate up the tree further, as discussed
nodes that contain i in their range, and do so in such above.
a way that preserves zeros. The key insight is that Function increment needs to change as well and
when modifying a value V[i] in the SFT by adding must iterate node by node upwards towards the root to
or subtracting a given quantity, this difference can ensure that the differences to every node are propagated.
propagate up the tree node by node, computing what 1 def incrementPreservingZeros(values, scales, i, x)
would needed to change each node’s parent which is :
then incremented appropriately. The algorithm stores 2 x=x/getScale(S, i)
3 while i<= maxIndex:
that difference and continues iterating up the tree. As 4 newValue=values[i]+x
the algorithm traverses the tree upwards to modify node 5 x=newValue*S[i]-values[i]*scales[i]
i, it will first have visited one of its children, which will 6 values[i]=newValue
7 i+=i&(-i)
be one of the terms on the right-hand side of (5). Then,
modify V[i] by S[i] times the delta applied child by, Listing 7. Prefix Sum

which will preserve criterion (5). The function getScale computes scale(i) by accumu-
Below is pseudo-code for the obliterate operation. lating the product of entries in S traversing the int tree
Lines 2-6 below compute the difference between the left- from i to the root.
hand side and right-hand side of (5). Lines 7-11 then Similar tricks are at play for mult. Recall from the
apply this difference to the left-hand side of (5) and discussion of mult above that, in addition to applying
propagate the difference up the tree. the new scale factor to various elements of the scaling
1 def obliterate(values, S, i): array S, specific overlapping values of the values array X
2 j=1 also need to be updated as well. Line 4 below computes
3 runningSum=-values[i]
4 while j&i==0: that delta, and the while loop starting on line 9 applies
5 runningSum+=scales[i-j]*values[i-j] it consistently to the overlapping intervals.
8 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

1 def multPreservingZeros(values,scales, i, unpredictable, can add significant overhead in terms of


factor): time, particularly for programs that create and discard
2 runningSum=0
3 j=i&(-i) many objects.
4 while j<=maxIndex: Nonetheless, we see consistent behavior across a wide
5 if(i&j): range of array sizes, from a few dozen to ten of millions
6 runningSum+=values[i]*scales[i]*f
7 -scales[i]*values[i] of indices. The behavior of the average times and the
8 scales[i]*=factor extremal (worst case) times are consistent as well. This
9 i-=j provides assurance that these theoretical and empirical
10 else:
11 values[i+j]+=runningSum results are accurate representations of a typical imple-
12 runningSum=scales[i+j]*values[i+j] mentation.
13 -scales[i+j]* Furthermore, SFTs have been implemented in the
14 (values[i+j]-runningSum)
15 j=j<<1 Ethereum Virtual Machine using Solidity, a program-
ming language designed for implementing smart con-
Listing 8. Multiply
tracts running on the Ethereum Virtual Machine, as
part of the Ajna Protocol open source project. Run
VI. EXPERIMENTAL RESULTS time complexity in Solidity is best measured using gas
Experimental results comparing the time performance of utilization, and the memory model is very different than
Scaling Fenwick Trees to both a naive implementation a typical x86 based architecture. Despite the entirely dif-
and a standard "Base Fenwick" tree implementation of ferent environment and constraints, the results obtained
the array interface, including updating, interrogation, from this implementation were consistent with the the-
and scaling are shown in Figure 5 and Figure 6. The raw oretical expectations, demonstrating SFT’s adaptability
numerical values are included in Table 3 and Table 4. and consistency of the results discussed above. While the
The data show a large decrease in scaling times for particular empirical results discussed in detail here are
the Scaling Fenwick Tree. This improvement is offset influenced by many system and implementation factors,
by small increases for updates as compared to either the fundamental efficiency of the algorithm as predicted
alternative implementation. For prefix sum queries, the by its theoretical time complexity manifests consistently
scaled Fenwick tree performs slightly worse than the across diverse platforms.
baseline Fenwick tree, but both tree implementations Figure 5 is a log plot of the average execution time
significantly outperform the naive implementation. This in milliseconds versus array length for all nine pairs
pattern is true both for average and worst-case ex- of operation "update," "query" and "multiply" with
perimental statistics. All of these empirical results are implementation "naive," "baseline Fenwick," "Scaled
consistent with the expectations based on the theoretical Fenwick." These were tested using the Python imple-
analysis of the algorithms (namely, that SFTs would mentation discussed above on an AMD Ryzen 9 5950 3.7
offer the best performance for rescaling, with the tradeoff GHz running Ubuntu Linux version 22.04. The x-axis is
of slightly worse performance for updates and range sum log2 of the array length, so the longest arrays were of
queries). length 225 = 33, 554, 432. Each table value is an average
Python3 code for the Scaled Fenwick Tree and both of 100 runs for the paired operation and implementation,
alternative implementations is available as a repro- with the index randomly sampled up to the array length.
ducible run on Code Ocean in [14]. While the particular Identical operations (values and array indices) were used
experimental results cited here are for a particular for each of the three different implementations for each
desktop machine (which is described below), the results operation.
available in [14] agree with these results and can be easily The maximal data among the same 100 runs for each
reproduced on Code Ocean. condition are contained in Figure 6. The worst case
In considering the reported run-time results, it’s essen- execution time of the scaled Fenwick tree for these three
tial to bear in mind the specific details of the machines operations for arrays of length 225 = 33, 554, 432 are
and the implementations used in the experiments. The under a third of a second.
observed timing, for instance, could be influenced by
factors such as the memory model, including cache uti- VII. CONCLUSION
lization and cache coherence. Modern processors make Scaled Fenwick Trees are a novel data structure and a
extensive use of caches, and data locality can signifi- suite of algorithms that enable efficient manipulation of
cantly affect performance. Therefore, an algorithm that numerical array data. Updates, range sums, searches,
makes efficient use of cache can often outperform a the- and multiplying arbitrary ranges of values by nonzero
oretically faster algorithm that does not. In the case of scalars can all be implemented in time logarithmic in
the Python implementation, the presence of automatic the lengths of the array. The data structure is space
memory management, or garbage collection, could also redundant and requires storing two numerical values for
impact performance. Garbage collection pauses, often every array entry. This research was motivated by a
VOLUME 10, 2022 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

TABLE 3. Mean Execution Time Comparison for Naive Versus Baseline Fenwick Versus Scaled Fenwick

Update Times (ms) Query Times (ms) Scale Times (ms)


Scaled Baseline Scaled Baseline Scaled Baseline
log2 (n) Fenwick Naive Fenwick Fenwick Naive Fenwick Fenwick Naive Fenwick
1 0.0156 0.0034 0.0036 0.0033 0.0006 0.0004 0.0152 0.0093 0.0158
3 0.0260 0.0035 0.0038 0.0135 0.0008 0.0005 0.0296 0.0177 0.0353
5 0.0360 0.0034 0.0040 0.0220 0.0022 0.0007 0.0495 0.0516 0.1195
7 0.0429 0.0034 0.0041 0.0299 0.0068 0.0007 0.0623 0.2016 0.4764
9 0.0509 0.0033 0.0043 0.0377 0.0259 0.0009 0.0784 0.7324 1.8006
11 0.0610 0.0034 0.0045 0.0474 0.1152 0.0010 0.0990 2.9509 7.4433
13 0.0701 0.0033 0.0047 0.0567 0.4223 0.0012 0.1168 11.1681 29.1188
15 0.0778 0.0033 0.0049 0.0636 1.5733 0.0013 0.1322 50.0891 134.1686
17 0.0893 0.0034 0.0052 0.0720 6.5255 0.0014 0.1513 176.3270 482.9354
19 0.0974 0.0035 0.0053 0.0811 26.3857 0.0016 0.1654 694.6900 1953.3862
21 0.1043 0.0035 0.0056 0.0914 109.3884 0.0018 0.1844 3166.3422 9177.6498
23 0.1142 0.0035 0.0061 0.0974 347.2394 0.0023 0.2031 11600.0058 34277.0195
25 0.1225 0.0035 0.0067 0.1077 1620.3959 0.0030 0.2158 47479.3940 143150.2115

TABLE 4. Maximum Execution Time Comparison for Naive Versus Baseline Fenwick Versus Scaled Fenwick

Update Times (ms) Query Times (ms) Scale Times (ms)


Scaled Baseline Scaled Baseline Scaled Baseline
log2 (n) Fenwick Naive Fenwick Fenwick Naive Fenwick Fenwick Naive Fenwick
1 0.0378 0.0086 0.0089 0.0055 0.0016 0.0014 0.0184 0.0128 0.0191
3 0.0455 0.0085 0.0041 0.0183 0.0016 0.0010 0.0439 0.0265 0.0531
5 0.0508 0.0037 0.0045 0.0293 0.0039 0.0012 0.0680 0.0921 0.2128
7 0.0657 0.0037 0.0049 0.0364 0.0140 0.0013 0.0813 0.3505 0.8150
9 0.0734 0.0037 0.0049 0.0468 0.0498 0.0013 0.1025 1.3725 3.3697
11 0.0987 0.0037 0.0054 0.0569 0.2016 0.0015 0.1267 5.8937 14.0704
13 0.1074 0.0038 0.0054 0.0706 0.7839 0.0018 0.1505 22.3857 57.4312
15 0.1181 0.0040 0.0059 0.0767 3.2447 0.0020 0.1614 89.5844 238.2480
17 0.1339 0.0040 0.0064 0.0882 12.9848 0.0024 0.1857 364.3869 981.7368
19 0.1533 0.0044 0.0065 0.0934 51.4584 0.0026 0.2038 1443.0576 4019.2751
21 0.1598 0.0048 0.0084 0.1037 204.3815 0.0029 0.2155 5738.4169 16565.4410
23 0.1588 0.0047 0.0086 0.1117 809.4136 0.0043 0.2352 22613.8160 66781.0546
25 0.1691 0.0047 0.0086 0.1276 3298.4628 0.0059 0.2608 91133.6823 271345.1379

FIGURE 5. Plot of Log Average Execution Time (ms) Versus log2 (n) FIGURE 6. Plot of Log Maximum Execution Time (ms) Versus log2 (n)

particular problem in the management of a database range rescalings for linear numerical array data of tens
of loans and lenders for a blockchain-based decentral- of millions of data points on common desktop consumer
ized finance application. Similarly, structured problems hardware.
present themselves in coding and compression, data Scaled Fenwick Trees do come with some drawbacks.
analysis, filtering and sorting, and other areas, however, There is space redundancy in the form of an additional
so this research may find application well beyond its scaling array, so that twice the memory usage is neces-
original motivation. Experimental results show that this sary to hold the same number of data points as compared
algorithm enables sub-second updates, range sums and to either a straightforward naive array or classical Fen-
10 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3299352

wick tree. As with classical Fenwick Trees or Segment MATTHEW CUSHMAN Matthew Cushman
Trees, updating an array value requires logarithmic, not received the B.S. degree in mathematics and
logic and computation, and the M.S. degree
linear, time in the array length. Finally, compared to in mathematics, from Carnegie Mellon Uni-
a classical Fenwick tree, both updates and range sums versity in Pittsburgh, PA, and the Ph.D.
require additional computation to incorporate the values degree in mathematics from the University
in the scaling array so that while both methods are log- of Chicago. He was Managing Director at
Knight Capital Group from 2002 to 2011,
time complexity, the constants are worse for the scaled Senior Managing Director at Citadel Secu-
Fenwick tree. rities from 2011 to 2013, and a co-founder
Overall, for efficient implementation of all three op- of Engineers Gate in 2014. He left Engineers Gate in 2017 to found
erations: updates, range sums and range rescaling of Etale, Inc., a trading software firm that was acquired by NYDIG in
2020. From January 2022 to the present, he has been a co-founder
linear array date, Scaled Fenwick Trees offer signficant of Ajna Labs, where he works on smart contract protocol design
advantages with reasonable offsetting disadvantages. In and implementation.
applications that require frequent rescalings in particu-
lar, Scaled Fenwick Trees can be a good choice of data
structure to store and process data.

VIII. ACKNOWLEDGEMENT
The author would like to acknowledge valuable conver-
sations with Shiva Chaudhuri, Mike Hatheway, George
Niculae, Ed Noepel, and Sebastiano Vigna.

REFERENCES
[1] Peter M. Fenwick. A new data structure for cumulative fre-
quency tables. Software: Practice and Experience, 24(3):327–
336, 1994.
[2] Boris Ryabko. A fast on-line code. Soviet Math. Dokl.,
39(3):533–537, 1989.
[3] B.Y. Ryabko. A fast on-line adaptive code. IEEE Transac-
tions on Information Theory, 38(4):1400–1404, 1992.
[4] Guy E. Blelloch. Prefix sums and their applications. In J. H.
Reif, editor, Synthesis of Parallel Algorithms, 1990.
[5] Philip Bille, Anders Roy Christiansen, Patrick Hagge Cord-
ing, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel
Vildhøj, and Søren Vind. Dynamic relative compression,
dynamic partial sums, and substring concatenation. Algo-
rithmica, 80(11):3207–3224, 2018.
[6] Mihai Pătraşcu and Erik D. Demaine. Lower bounds for
dynamic connectivity. In Proceedings of the Thirty-Sixth
Annual ACM Symposium on Theory of Computing, STOC
2004, pages 546–553, New York, NY, USA, 2004. Association
for Computing Machinery.
[7] Ernst W. Mayr, Gunther Schmidt, and Gottfried Tinhofer,
editors. Prefix graphs and their applications, Berlin, Heidel-
berg, 1995. Springer Berlin Heidelberg.
[8] Giulio Ermanno Pibiri and Rossano Venturini. Practical
trade-offs for the prefix-sum problem. Software: Practice and
Experience, 51, 10 2020.
[9] Christian Reinbold and Rü diger Westermann. Parameterized
splitting of summed volume tables. Computer Graphics
Forum, 40(3):123–134, 2021.
[10] Jens Schneider and Peter Rautek. A versatile and efficient
gpu data structure for spatial indexing. IEEE Transactions on
Visualization and Computer Graphics, 23(1):911–920, 2017.
[11] Simon S. Du Yining Wang, Yi Wu. Near-linear time local
polynomial nonparametric estimation with box kernels. IN-
FORMS Journal on Computing, 33(4):1339–1353, 2021.
[12] Pushkar Mishra. On updating and querying sub-arrays of
multidimensional arrays. CoRR, abs/1311.6093, 2013.
[13] Stefano Marchini and Sebastiano Vigna. Compact fenwick
trees for dynamic ranking and selection. Software: Practice
and Experience, 50, 01 2020.
[14] Matt Cushman. Scaled fenwick tree reference imple-
mentation, validation, time benchmarks and comparisons.
[Link] 4 2023.

VOLUME 10, 2022 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]

You might also like