0% found this document useful (0 votes)

77 views17 pages

Blockchain Goes Green?

This document analyzes the performance and energy usage of running blockchain systems like Ethereum, Parity, and Hyperledger on both high-performance nodes and low-power nodes. It finds that low-power ARM-based systems struggle to run full blockchains due to limited memory size and bandwidth. For example, a Raspberry Pi 3 cannot run Ethereum without modifications. However, the NVIDIA Jetson TX2 achieves around 80% and 30% of the throughput of Parity and Hyperledger respectively while using 18x and 23x less energy than traditional servers.

Uploaded by

Dumi Loghin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views17 pages

Blockchain Goes Green?

Uploaded by

Dumi Loghin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Blockchain Goes Green?

An Analysis of
Blockchain on Low-Power Nodes

Dumitrel Loghin∗, Gang Chen‡, Tien Tuan Anh Dinh∗, Beng Chin Ooi∗, Yong Meng Teo∗
∗
National University of Singapore, ‡Zhejiang University
∗
[dumitrel,dinhtta,ooibc,teoym]@comp.nus.edu.sg, ‡[email protected]
arXiv:1905.06520v2 [cs.DC] 17 Jun 2019

ABSTRACT mining phase running cryptographic algorithms. In addi-

Motivated by the massive energy usage of blockchain, on tion to the high cost of the consensus phase, we show in this
the one hand, and by significant performance improvements paper that newer versions of Ethereum exhibit long trans-
in low-power, wimpy systems, on the other hand, we per- action execution time due to a design choice that leads to
form an in-depth time-energy analysis of blockchain sys- many transactions being restarted multiple times.
tems on low-power nodes in comparison to high-performance Other proposed consensus protocols, including proof-of-
nodes. We use three low-power systems to represent a wide authority (PoA), proof-of-stake (PoS) and proof-of-elapsed-
range of the performance-power spectrum, while covering time (PoET), among others, are promising in terms of low-
both x86/64 and ARM architectures. We show that low-end ering the time-energy costs because they do not include
wimpy nodes are struggling to run full-fledged blockchains resource-intensive algorithms. In this paper, we analyze
mainly due to their small and low-bandwidth memory. On three blockchain systems with different consensus protocols,
the other hand, wimpy systems with balanced performance- namely Ethereum, Parity and Hyperledger which are using
to-power ratio achieve reasonable performance while saving PoW, PoA and Practical Byzantine Fault Tolerance (PBFT)
significant amounts of energy. For example, Jetson TX2 [8] consensus protocols, respectively.
nodes achieve around 80% and 30% of the throughput of The high energy consumption of well-known blockchain
Parity and Hyperledger, respectively, while using 18× and systems is a result of using traditional high-performance
23× less energy compared to traditional brawny servers with hardware in the mining process. The ASICs used in min-
Intel Xeon CPU. ing Bitcoin or the high-end GPUs used by Ethereum have
power characteristics in the order of hundreds or thousands
of Watts (W). Even traditional high-performance CPU-only
systems used by many blockchains are using a significant
1. INTRODUCTION quantity of energy. On the other hand, emerging low-power
Recent years have seen the increasing adoption of block- devices based on ARM architecture are showing significant
chain systems by enthusiasts, industry and the academia. performance improvements that promote them as alterna-
Starting with Bitcoin [17], and followed by a myriad of other tive to traditional x86/64 servers [25]. While the majority
networks, among which Ethereum [7] is one of the most pop- of low-power, wimpy nodes [15] are deployed at the edge,
ular, these public blockchain systems are mainly used for in Internet of Things (IoT) setups, higher-end devices are
cryptocurrencies. More recently, permissioned blockchains, targeting the server market. In this paper, we seek answer
such as Hyperledger [6], are used to facilitate asset manage- to the question of whether these low-power devices are able
ment among mutually distrusting entities. to run blockchain and at what performance-to-power cost.
With an estimated annual energy usage of 52 TWh in The performance of modern blockchain systems on tradi-
2018, Bitcoin network is using more energy than developed tional servers was analysed using BLOCKBENCH in [11],
countries such as Portugal or Singapore [9]. Even Ethereum among other works. In this paper, we extend BLOCK-
would enter top 100 countries with an estimated energy us- BENCH in two significant directions. First, we extend the
age of almost 10 TWh [10]. When accounting for the other original time-performance analysis performed on Xeon CPUs
hundreds or thousands of blockchain systems, we obtain a to low-power, wimpy nodes. Second, we provide in-depth
worrying figure for energy usage. While other IT domains, analysis of energy cost on both brawny and wimpy systems.
such as cloud and high-performance computing [20], have For brawny nodes, we use the same system specified in [11].
been optimizing to reduce power consumption and to in- For wimpy nodes, we use three systems covering a wide
crease energy efficiency, we believe it is time for blockchain performance-to-power spectrum, as well as both x86/64 and
systems to follow this trend. ARM architectures.
Energy represents the electricity used in a period of time. We make the following contributions in this paper:
In computer systems, high energy is the result of (i) long ex-
ecution time and/or (ii) high power usage of active subsys- • We provide the first extensive time-energy performance
tems, such as the main processor (CPU), graphics processor study of state-of-the-art blockchain systems on both
(GPU) or other accelerators. In public blockchain platforms, high-performance and low-power nodes acting as min-
Proof-of-Work (PoW) consensus protocols are considered ers or validators. We examine Hyperledger, Ethereum
the Achilles’ heel in terms of time-energy performance [9, and Parity running on Intel Xeon, NUC [2], NVIDIA
10]. This is because PoW consists of a compute-intensive Jetson TX2 [14], and Raspberry Pi 3 [1]. We share

1
not only the analysis, but our experience of running model, ones can write general applications that operate on
blockchain on various system architectures. the blockchains states. Such applications are called smart
contracts. In this paper, we use BLOCKBENCH bench-
• We show that low-power ARM-based systems struggle marks which provides a set of smart contracts for Hyper-
to run full-fledged blockchain workloads mainly due to ledger, Ethereum and Parity.
insufficient memory size and bandwidth. For example, Depending on how nodes can join the network, the block-
the low-end Raspberry Pi 3 wimpy node is unable to chain is public or private (or permissioned ). In public net-
run Ethereum, and it requires non-trivial code modifi- works, anybody can join or leave and, thus, the security risks
cations and special configuration to run Hyperledger. are high. Most of the cryptocurrency blockchains are public,
• We show that systems with the lowest power profile such as Bitcoin [17] and Ethereum [7]. On the other hand,
do not necessarily achieve the best energy efficiency. private blockchains allow only authenticated peers to join
For example, Jetson TX2 is more energy-efficient than the network. Typically, private blockchains, such as Hyper-
Raspberry Pi 3, even if the latter has a lower power ledger [6] and Parity [18], are deployed inside or across big
profile. organizations.
Blockchains operate in a network of mutually distrusting
• We show that wimpy nodes can achieve reasonable peers, where some peers may not be just faulty but ma-
performance while saving significant amounts of en- licious. Hence, they assume a Byzantine environment, in
ergy. For example, eight Jetson TX2 nodes trade 17% contrast to the crash-failure model used by the majority of
and 72% of Parity and Hyperledger throughput, re- distributed systems. To ensure consistency among honest
spectively, for 18× and 23× lower energy consumption peers, most private blockchains use Byzantine fault-tolerant
compared to eight Xeon nodes. consensus protocols such as PBFT [8], whereas most public
blockchains use proof-of-work (PoW) consensus protocols.
• Our analysis of Ethereum performance leads to an in-
In PoW, participating nodes, called miners, need to solve a
sight into the design trade-off in newer Ethereum re-
difficult cryptographic puzzle. The node that solves the puz-
leases compared to older ones that are used in [11]. In
zle first has the right to append transactions to the ledger.
particular, the new design has lower throughput due
On the other hand, PBFT consists of exchanging O(n2 ) mes-
to the cost of many transaction execution restarts.
sages among the nodes to reach agreement on the transac-
The remainder of this paper is organized as follows. In tions to be appended to the blockchain. These consensus
Section 2 we present background and related work on block- protocols are considered the Achilles’ heel of blockchain due
chain systems. In Section 3 we describe the hardware sys- to poor time-energy performance. While PoW is scalable
tems and blockchain workloads used in this study. We also since it can run in parallel on all nodes, it is compute-
provide a detailed characterization of the hardware systems intensive and, thus, it is both slow and power-hungry on
in this section. In the next two sections, we analyze the time traditional brawny servers. PBFT exhibits quadratic time
and energy performance at single-node and cluster level. We growth with the number of nodes in the network, leading to
conclude in Section 6. energy wastage.
Our analysis in the next sections confirms that a PoW-
2. BACKGROUND AND RELATED WORK based blockchain, such as Ethereum, uses more power com-
pared to a PBFT- or PoA-based blockchain. A PBFT-based
In this section, we provide a background on blockchain
blockchain, such as Hyperledger, uses almost the same power
systems and survey the related work on time and energy
as a PoA-based blockchains, such as Parity, on small net-
performance analysis of blockchains.
works of up to eight nodes.
2.1 Blockchain Systems
A blockchain is a distributed ledger running on a network 2.2 The Time-Energy Analysis of Blockchains
of mutually distrusting nodes (or peers). The ledger is stored There are a number of related works that analyze perfor-
as a linked list (or chain) of blocks of transactions. The links mance of blockchains [11, 19]. However, only a few include
in the chain are built using cryptographic pointers to ensure energy analysis [21, 23], and the analysis is of limited depth.
that no one can tamper with the chain or with the data BLOCKBENCH [11] is a benchmarking suite compris-
inside a block. ing both simple (micro) benchmarks and complex (macro)
Blockchains are most famous for being the underlying benchmarks. The micro benchmarks, namely CPUHeavy,
technology of cryptocurrencies, but many blockchains are IOHeavy and Analytics, stress different subsystems such as
able to support general-purpose applications. This ability the CPU, memory and IO. On the other hand, YCSB macro
is determined by the execution engine and data model. For benchmark implements a key-value storage, while Smallbank
example, Bitcoin [17] supports only operations related to represents OLTP and simulates banking operations. These
cryptocurrency (or token) manipulation. On the other hand, benchmarks are implemented as smart contracts in Ethe-
Ethereum [7] can run arbitrary computations on its Turing- reum, Parity and Hyperledger. Their performance in terms
complete Ethereum Virtual Machine (EVM). At data model of throughput and latency is evaluated on traditional high-
level, there are at least three alternatives used in practice. performance servers with Intel Xeon CPU. In this paper, we
The Unspent Transaction Output (UTXO) model, used by extend BLOCKBENCH to include time-energy analysis of
Bitcoin, represents the ledger states as transaction ids and a wider range of systems, with focus on low-power nodes.
associated unspent amounts which are the input of future Sankaran et al. [21] analyze the time and energy perfor-
transactions. The account/balance model resembles a clas- mance of an in-house Ethereum network consisting of high-
sic banking ledger. A more generic model used by Hy- performance mining servers and low-power Raspberry Pi
perledger consists of key-value states. On top of the data clients. These low-power systems cannot run Ethereum min-

2
Table 1: Systems characterization

Xeon NUC TX2 RP3

ISA x86-64 x86-64 AARCH64 ARMv7l
Cores 6(12) 2(4) 6 4
Frequency 3.50 GHz 2.40 GHz 2.04 GHz 1.20 GHz
L1 Data Cache 32 kB 32 kB 32-128 kB 32 kB
Specs L2 Cache 256 kB (core) 256 kB (core) 2 MB 512 kB
L3 Cache 12 MB 3 MB N/A N/A
Memory 32 GB DDR3 32 GB DDR4 8 GB LPDDR4 1 GB LPDDR2
Storage 2 TB HDD 256 GB SSD 64 GB SD card 64 GB SD card
Networking Gbit Gbit Gbit 100 Mbit
CoreMark (one core) [IPS] 25201.6 18022.3 12019.7 3591.8
System power [W] 70.6 13.8 8.8 3.0
CoreMark (all cores) [IPS] 186924.9 50582.4 67345.9 11031.3
System power [W] 127.7 18.6 11.7 4.9
CPU Keccak256 [MBPS] 314.2 217.4 119.0 1.0
System power [W] 77.1 14.7 4.9 2.4
Keccak512 [MBPS] 169.9 116.2 65.2 0.9
System power [W] 74.1 14.7 4.8 2.3
Idle system power [W] 50.8 9 2.4 1.9
Write throughput [MB/s] 160.0 409.0 16.3 12.5
Read throughput [MB/s] 172.0 551.0 88.9 22.6
Storage Buffered read throughput [GB/s] 8.1 6.6 2.7 0.8
Write latency [ms] 9.3 1.0 17.1 14.0
Read latency [ms] 2.5 0.2 2.8 25.5
TCP bandwidth [Mbits/s] 941 839 943 94
Network UDP bandwidth [Mbits/s] 810 813 755 96
Ping latency [ms] 0.14 0.14 0.3 0.35

ing due to their limited memory size, hence, they only take 3.1 Systems
the role of clients. In this paper, we run the Ethereum full We compare the time and energy performance of low-
nodes on low-power devices with higher performance, such power systems against a high-performance traditional server
as Intel NUC and Jetson TX2. To the best of our knowledge, system. This server system is based on a x86/64 Intel Xeon
we are the first to run and analyze the time-energy perfor- E5-1650 v3 CPU clocked at 3.5GHz, and has 32GB DDR3
mance of full-fledged blockchains on low-power systems. memory, 2TB hard-disk (HDD) and 1Gbps networking in-
MobiChain [23] is an approach that allows mining on mo- terface card (NIC). It runs Ubuntu 14.04 with Linux kernel
bile devices running Android OS, in the context of mobile 3.13.0-95.
commerce. While containing analysis of both time and en- The low-power systems used for the analysis are (i) Intel
ergy performance, MobiChain has no comparison to other NUC [2], (ii) NVIDIA Jetson TX2 [14] and (iii) Raspberry
blockchains. In terms of energy analysis, the authors show Pi 3 model B (RP3) [1]. The NUC system is based on
that it is more energy-efficient to group multiple transac- a x86/64 Intel Core i3 CPU with two physical cores that
tions in a single block since there is less mining work and support Hyperthreading and run at 2.4GHz. This system
therefore less time and power wasted in this process. How- has 32GB DDR4, 256GB solid-state drive (SSD) and 1Gbps
ever, larger blocks increase latency and result in poor user NIC. It runs Ubuntu 16.04 with Linux kernel 4.15.0-34.
experience. The TX2 system is based on a heterogeneous 6-core 64-
Jupiter [16] is a blockchain designed for mobile devices. It bit CPU with two NVIDIA Denver cores and four ARM
aims to address the problem storing large ledger on mobile Cortex-A57 cores clocked at more than 2GHz. The system
devices with limited storage capacity. However, there is no has 8GB LPDDR4, a 32GB SD card and 1Gbps NIC. TX2
time or energy performance evaluation. is running Ubuntu 16.04 with Linux kernel 4.4.38-tegra of
To the best of our knowledge, we provide the first ex- aarch64 (64-bit ARM) architecture.
tensive time-energy performance analysis of blockchain sys- The RP3 has a 4-core ARM Cortex-A53 CPU of 64-bit
tems on low-power, wimpy nodes in comparison with high- ARM architecture and 1GB of LPDDR2 memory. This sys-
performance server systems. tem has a 64GB SD card that acts as storage and 100Mbps
NIC. It runs Debian 9 (stretch) with Linux kernel 4.9.80-v7+
(32-bit ARM).
3. EXPERIMENTAL SETUP We measure power and energy consumption of these sys-
tems with a Yokogawa power meter connected to the AC
In this section, we describe our experimental setup, start- lines. We report only AC power and energy values in this
ing with the systems and ending with the workloads. We paper. We believe that these values are more useful com-
present a detailed characterization of the selected systems pared to DC measurements since they reflect the final bill-
at CPU, memory, storage and networking levels. The results able energy.
of this detailed characterization are summarized in Table 1.

3
200000 140
30000 80 186924.9
127.7 Performance
70.6
Performance 180000 Power
25201.6 Power 70 120
25000 160000

Performance [IPS]
Performance [IPS] 60 140000 100
20000

Power [W]
50 120000

Power [W]
18022.3
80
100000
15000 40
12019.7 80000 60
67345.9
30
10000 60000 50582.4 40
13.8
20 40000
18.6
5000 20
8.8 3591.8
10 20000 11.7
11031.3
4.9
3
0 0
0 0 Xeon NUC TX2 RP3
Xeon NUC TX2 RP3 (12 cores) (4 cores) (6 cores) (4 cores)
(a) CoreMark on one core (b) CoreMark on all cores
350 100 350 100
Performance Performance
314.2
Power 300 Power
300
77.1 80 80

Performance [MBPS]
74.1
Performance [MBPS]

250 250
217.4

Power [W]
Power [W]
60 200 60
200
169.9

150 40 150
119.0 116.2
40
100 100
14.7 20 65.2
20
14.7
50 50
4.9
1.0 2.4 4.8
0.9 2.3
0 0 0 0
Xeon NUC TX2 RP3
Xeon NUC TX2 RP3
(c) Keccak256 on one core (d) Keccak512 on one core

Figure 1: Performance and power at CPU level

3.2 Systems Characterization At multi-core level, TX2 exhibits better performance than
Before analyzing the time and energy of blockchains on NUC, mainly because of its six real cores compared to only
the selected systems, we evaluate the hardware at CPU, two real cores on NUC. Moreover, TX2 uses less power than
memory, storage and networking level to understand their NUC to deliver higher performance. Therefore, it is ex-
relative performance. The measured values and system char- pected that TX2 has a better time-energy performance for
acteristics are summarized in Table 1. multi-threaded workloads. We also observe that the perfor-
We first measure idle system power when the hardware is mance is not scaling perfectly with the number of cores. For
running only the OS. We obtain 50W, 9W, 2.4W and 1.9W example, Xeon exhibits only 7.4 times performance boost
for Xeon, NUC, TX2 and RP3, respectively. These values when 12 cores are used. TX2 is performing better, with
clearly show the power efficiency gap between brawny nodes a 5.6 performance increase when 6 cores are used. This
used in the majority of datacenters and supercomputers, and sub-linearity is due to resource contention, both in-core and
wimpy nodes used at the edge. off-core [24].
To assess CPU performance, we use CoreMark benchmark Blockchain systems rely heavily on cryptographic opera-
which is increasingly used by the industry, including vendors tions that are CPU-intensive. We evaluate the CPU on run-
that equip their systems with ARM CPUs [4]. CoreMark ning this type of workload by measuring the performance
measures CPU performance in terms of iterations per second and average power of Keccak secure hash algorithm from
(IPS). We present the performance and average power usage go-ethereum v1.8.15, compiled with go 1.11. We run both
in Figure 1a and Figure 1b for CoreMark running on a single Keccak256 and Keccak512 on a random input of one bil-
core and all cores, respectively. For multi-core analysis, we lion bytes. The throughput measured in MB per second
enable all available cores, including virtual cores in systems (MBPS) represents the performance of these cryptographic
that support Hyperthreading. For example, we use twelve algorithms on the selected systems. As shown in Figure 1c
and four virtual cores on Xeon and NUC, respectively. and Figure 1d, the performance trends are similar to Core-
At single-core level, the performance of Xeon is 1.4, 2.1 Mark. RP3 exhibits much lower performance: almost 320×
and 7 times higher compared to NUC, TX2 and RP3, re- and 190× lower throughput compared to Xeon on Keccak256
spectively. But this performance comes at the cost of 5.1×, and Keccak512, respectively. The lower system power of
8× and 23.5× higher power consumption. However, we note RP3 running these cryptographic operations compared to
that this is the power used by the entire system which in- CoreMark suggests that the core is not fully utilized. In
cludes other components beside the CPU. We then estimate fact, it is often stuck in memory operations that use less
the power of CPU by subtracting the values for idle system power compared to arithmetic operations. As we shall see
power. One Xeon core uses almost 20W, while one ARM in the next paragraph, RP3’s memory has significantly lower
core from RP3 uses only 1.1W. Hence, the performance-to- bandwidth than the other three systems.
power ratio (PPR) of the RP3 is superior to that of the We analyze the performance of the memory subsystem in
Xeon. terms of bandwidth. We use lmbench [22] to get the read-

4
131072
Xeon TX2 Observation 1. In summary, the hardware systems have
NUC RP3
65536 the following characteristics.

32768 Observation 1.1. Low-power x86/64 devices, such as In-

tel NUC, can match the performance of server-class sys-
Bandwidth [MB/s]

16384 tems at memory and storage level while using 5× less power.
8192
However, CPU performance is lower when running multi-
threaded workloads due to the small number of cores.
4096
Observation 1.2. High-end ARM-based wimpy devices,
2048
such as Jetson TX2, have potential to achieve high PPR
1024 at the cost of lower time performance compared to x86/64
systems.
512
1kB 32kB 1MB 32MB 1GB
Observation 1.3. Low-end ARM-based devices, such as
Memory Access Size [MB]
Raspberry Pi 3, suffer from low core clock frequency, small
Figure 2: Memory bandwidth comparison and low-bandwidth memory. These systems may not be able
to run modern server-class workloads, including blockchains.

3.3 Workloads
write bandwidth and plot the results in Figure 2. At level
We use BLOCKBENCH [11] with minor changes1 to as-
one cache (L1), Xeon has the highest bandwidth, which is
sess blockchain performance. We were not able to com-
almost 60GB/s, while NUC, TX2 and RP3 exhibit band-
pile go-ethereum v1.4.18 evaluted in the original BLOCK-
widths of 37GB/s, 19GB/s and 6GB/s, respectively. This
BENCH paper [11] on TX2 due to issues with older versions
is expected since server-class processors, such as Xeon, have
of go toolchain on aarch64 architecture. We also encoun-
optimized caches. However, at the main memory level, NUC
tered issues with the compilation of parity-ethereum v1.6.0
leads with a bandwidth of 12.5GB/s, followed closely by
on all systems due to broken Rust packages. Hence, we
Xeon with 10GB/s. This lower performance of Xeon is at-
use go-ethereum v1.8.15 compiled with go 1.11 and parity-
tributed to the older DDR3 memory generation. TX2 and
ethereum v2.1.6 compiled with cargo 1.30.0 on all systems.
RP3 exhibit main memory bandwidths of less than 4GB/s
For Hyperledger experiments we use version v0.6 which sup-
and 1GB/s, respectively. This low bandwidth, together with
ports PBFT consensus.
small memory size hinder the execution of modern workloads
The micro-benchmarks in BLOCKBENCH assess the per-
on wimpy systems.
formance of different subsystems. CPUHeavy uses quicksort
At storage level, there is a mixed performance profile since
to sort an array of integers, while IOHeavy implements Write
the systems are equipped with different types of storage
and Scan operations that touch key-value pairs to stress
mediums. To assess throughput and latency, we use dd and
the memory and IO subsystems. The analytics benchmark
ioping Linux commands, respectively. As expected, the SSD
simulates typical OLAP workloads as found in traditional
of NUC exhibits the highest throughput and the lowest la-
databases. It implements three queries. The first query (Q1)
tency. On the other hand, the SD cards used by TX2 and
computes the total value of transactions between two blocks.
RP3 exhibit low throughput, high latency and significant
The second (Q2) and third (Q3) computes the maximum
read/write asymmetry. Since modern operating systems
transaction and the maximum account balance, respectively,
are caching files or chunks in memory, we also measure the
between two blocks for a given account. This benchmark
buffered read throughput. We observe that this throughput
requires an initialization step that creates 120,000 accounts
follows the memory bandwidth trend, except for NUC where
and generates over 100,000 blocks with an average of three
the buffered throughput of 6.6GB/s is half of the memory
transactions per block.
bandwidth.
The macro-benchmarks in BLOCKBENCH are complex
At networking level, we measure the bandwidth and la-
database applications stressing all key subsystems. For ex-
tency using iperf and ping Linux commands, respectively.
ample, YCSB evaluates the performance of a key-value store
As expected, RP3 exhibits lower TCP and UDP bandwidths
with configurable read-write ratios, while Smallbank repre-
since it is equipped with 100Mbps NIC, compared to the
sents OLTP workloads by simulating banking transactions.
Gigabit Ethernet NICs of the other systems. The slightly
Donothing benchmark estimates the overhead of consensus
higher latency of TX2 and RP3 can be attributed to the
protocols since it performs no computations and no IO op-
lower clock frequency of the wimpy systems. To validate this
erations inside the smart contract. In this paper, the macro-
hypothesis, we have measured the networking latency while
benchmarks are run on clusters of nodes.
setting the clock frequency to a fixed step. TX2 supports
All workloads are run at least three times. We report the
twelve frequency steps in the range 346MHz-2.04GHz. We
average values and standard deviations.
obtained a Pearson correlation coefficient of -0.93 between
the twelve frequency steps and corresponding latencies, sug- 3.4 Raspberry Pi 3 (RP3) Setup
gesting strong inverse proportionality. For example, the net-
RP3 is unable to run go-ethereum since it has only 1GB
working latency at 346MHz is 0.33ms, while at 2.04GHz is
of RAM while Ethereum requires more than 4GB. Modify-
0.25ms. On RP3, there are only two available frequency
ing go-ethereum to run on low-end wimpy devices is left to
steps, but we obtained similar results. While setting the
frequency to 600MHz and 1.2GHz, we obtained networking 1
The updated source code of BLOCKBENCH is available at
latencies of 0.35ms and 0.29ms, respectively. https://github.com/dloghin/blockbench

5
450
future work. In this paper, we only report the performance W/O Explicit GC Invocation, Swappiness=60

Memory Used [MB]

400 W/O Explicit GC Invocation, Swappiness=10
With Explicit GC Invocation
and energy of Ethereum on Xeon, NUC and TX2. 350
300
Running Hyperledger on ARM-based devices is challeng- 250
ing and requires non-trivial engineering work to patch the 200
150
existing code2 . Firstly, we need to recompile the Linux ker- 100
50
nel to support Docker, since Hyperledger is running the 0
chaincode inside Docker containers. Secondly, we need to 0 20 40 60 80 100 120 140
Timeline [s]
pre-compile several tools, such as protoc and grpc, used in
these containers for armv7l (32-bit ARM) and aarch64 (64- (a) Phase 1 (CPUHeavy Deploy)
bit ARM) architectures. Thirdly, we need to decrease the 450 W/O Explicit GC Invocation, Swappiness=60

Memory Used [MB]

400 W/O Explicit GC Invocation, Swappiness=10
size of some buffers and increase timeouts for the execution 350 With Explicit GC Invocation

on RP3. For example, we decreased cNameArr buffer size 300

250
from 256MB (which is one fourth of the available memory 200
150
on RP3) to 1MB, and we increased request execution time- 100
out from 30s to 10m. 50
0
Even with all these changes, populating the blockchain 0 10 20 30 40 50 60 70 80 90 100
for Analytics queries on RP3 leads to system crashes. We Timeline [s]
discovered that the swap space on the default Debian 9 OS (b) Phase 2 (IOHeavy Deploy)
of RP3 was 100MB which is too small given the main mem- 450 W/O Explicit GC Invocation, Swappiness=60

Memory Used [MB]

ory size of 1GB. We increased this swap size to 2GB. We 400 W/O Explicit GC Invocation, Swappiness=10
With Explicit GC Invocation
350
also found through profiling with Go pprof that much of 300
the memory is used in encoding and decoding protobuf ob- 250
200
jects. Many of these operations are redundant and can be 150
100
avoided by keeping extra fields in blockchain data structures. 50
One option to save memory is to de-allocate unused memory 0
0 2 4 6 8 10 12
space. To this end, we insert debug.FreeOSMemory() in Hy- Timeline [s]
perledger’s code after memory-intensive routines to make
sure the garbage collector (GC) is de-allocating memory (c) Phase 3 (CPUHeavy)
faster. We also decreased client’s transaction rate during
the initialization step of Analytics to allow GC to free more Figure 3: Hyperledger memory usage on RP3
memory. With all these changes in place, we were able to
run Hyperledger without problems.
While explicitly invoking Go’s GC is not necessary for 4. SINGLE NODE ANALYSIS
some workloads, such as CPUHeavy, we observed that IO-
heavy Write/Scan and Analytics almost always crash with- 4.1 Hyperledger
out this change. We evaluate the effects of this change The time-power performance3 of Hyperledger workloads
on a sequence of operations consisting of CPUHeavy de- is shown in Table 2, across four systems under evaluation.
ploy (phase 1), IOHeavy deploy (phase 2), and CPUHeavy Lower values are better for the execution time and power,
(phase 3) executions. The memory usage, depicted in Fig- while for the performance-to-power ratio (PPR) higher val-
ure 3, shows that when the GC is called explicitly (i) peak ues are better. We define the PPR as the ratio between
memory usage is smaller, as expected, but interestingly (ii) useful work and power. For simplicity, we consider useful
the execution becomes faster than without explicit GC in- work to be proportional with the input size. For exam-
vocation. Figures 3b and 3c illustrates clearly that explicit ple, for Xeon system, sorting one million numbers using the
GC invocation decreases the memory footprint at the end CPUHeavy smart contract performs one million operations
of phase execution. and consumes 50.6W, therefore its PPR is 19, 459.2 ops/J.
Intuitively, explicit GC should incur overheads and delay As expected, the fastest system is Xeon, followed by NUC,
the execution. But freeing memory aggressively may reduce TX2 and RP3, in this order. For example, RP3 is up to 15×
pressure on the Linux kernel swapping mechanism, and con- slower than Xeon for IOHeavy Write, while TX2 is more
sequently lead to faster execution. In Debian/Ubuntu OS, than 8× slower than Xeon for IOHeavy Scan. These perfor-
the swapping is controlled by a parameter called swappiness mance patterns are in accordance with the characterization
which is by default 60, but can take values from 0 to 100. A in Section 3.2.
value of 0 indicates that the kernel avoids swapping pages The average power profile is the opposite of time perfor-
out of the physical memory, while a value of 100 indicates mance, with Xeon being the most power-hungry and RP3
an aggressive swapping mechanism [3]. To test our hypoth- the most power-efficient. In particular, Xeon uses 50W and
esis, we lower the swappiness value to 10 for a lower swap- up to 84W for CPUHeavy and IOHeavy, respectively. For
ping overhead. With this change, the overall execution is the CPUHeavy benchmark there is a gradual increase in
faster, but the memory footprint is higher during execution, power usage with the growing input size since the CPU is
as shown in Figure 3c. However, we note that decreasing doing more work. The IOHeavy workloads incur more power
swappiness value does not help with Hyperledger’s crashes compared to CPUHeavy because they stress the memory
on heavy workloads. and storage, in addition to using the CPU. At the same
time, this high-power usage of IOHeavy remains constant as
2 3
The modified Fabric code is available on GitHub at Background colors represent relative performance, with
https://github.com/dloghin/fabric/tree/v0.6_raspberrypi green and red being the best and the worst, respectively.

6
Table 2: Time, Power and PPR of Hyperledger3
Execution Time [s] Power [W]
Performance-to-Power Ratio [ops/J]
Workload Size Average Std. dev. Average Std. dev.
Xeon NUC TX2 RP3 Xeon NUC TX2 RP3 Xeon NUC TX2 RP3 Xeon NUC TX2 RP3 Xeon NUC TX2 RP3
1000000 1.0 1.0 1.1 2.4 0.0 0.0 0.0 1.7 50.6 9.0 2.4 2.1 1.7 0.2 0.0 0.1 19,459.2 106,885.5 383,004.3 308,875.7
CPUHeavy 10000000 1.2 1.2 1.7 2.5 0.0 0.0 0.0 0.0 52.0 10.9 3.8 2.7 1.0 0.7 0.6 0.2 165,735.7 739,181.8 1,597,910.5 1,494,017.5
100000000 2.8 3.9 8.3 17.7 0.0 0.2 0.2 1.1 72.9 14.8 4.6 2.9 1.0 1.2 0.1 0.1 485,530.3 1,720,378.6 2,607,959.9 1,978,696.9
3200000 1,055.2 1,721.1 7,365.3 11,911.7 52.3 88.9 78.3 142.1 83.1 17.9 3.7 3.3 0.1 0.0 0.0 0.0 36.6 104.4 117.2 82.3
IOHeavy Write 6400000 2,125.1 3,473.2 14,675.3 27,246.0 85.3 139.9 114.0 309.6 83.0 17.9 4.5 3.1 0.1 0.1 1.1 0.0 36.3 103.1 102.9 74.9
12800000 4,299.0 7,025.0 28,957.0 63,891.3 102.5 162.7 751.9 917.2 83.0 17.9 3.7 3.0 0.1 0.0 0.1 0.0 35.9 101.6 120.1 66.2
3200000 744.1 1,442.0 6,191.7 8,915.3 5.9 7.2 79.3 42.6 83.7 15.1 3.1 3.2 0.1 0.0 0.0 0.0 51.4 147.1 169.3 112.4
IOHeavy Scan 6400000 1,487.1 2,871.4 10,960.3 17,296.3 11.5 26.5 2195.8 118.8 83.6 15.1 3.6 3.2 0.0 0.0 0.8 0.0 51.5 147.4 169.4 114.6
12800000 2,966.1 5,768.3 25,049.0 34,274.7 20.2 84.4 257.7 702.6 83.6 15.1 3.0 3.2 0.1 0.1 0.0 0.0 51.6 147.3 167.7 115.3
Analytics Q1 10000 9.7 18.8 157.6 103.8 0.0 0.0 0.3 5.0 90.3 18.1 2.9 3.2 0.5 0.1 0.0 0.0 11.4 29.5 21.8 30.6

7000 Consensus Execution Others 20000

Consensus Execution Others 1600 Consensus Execution Others

6000 1400
1200 15000
5000
1000

Time [s]
Time [s]

Time [s]
4000
800 10000
3000
600
2000
400 5000
1000 200
0 0 0
2 4 6 8 10 12 1 2 3 4 1 2 3 4 5 6
miner threads miner threads miner threads

(a) On Xeon (b) On NUC (c) On TX2

Figure 4: Ethereum execution breakdown

input size increases because (i) the CPU utilization is low takes longer as the number of miner threads increase. This
but roughly constant, and (ii) the memory and storage have would suggest that EVM is inefficient when more miner
lower dynamic power fluctuations [5]. For example, the av- threads are used.
erage CPU utilization is 9.5% (3.5% standard deviation) and We observed high variations among the executions of the
8.1% (3.6% standard deviation) during the IOHeavy Write same benchmark with the same number of miner threads.
and Scan, respectively, for 3.2 million key-value pairs. Table 3 shows five executions of CPUHeavy on NUC with
The energy is the product of execution time and average four miner threads. The high variations are visible in go-
power usage. Xeon and RP3 exhibit the highest energy cost ethereum releases starting from v1.8.14. Previous releases,
due to high power usage for the former and long execution including v1.4.18 used by BLOCKBENCH [11], exhibit rel-
time for the latter. On the other hand, TX2 and NUC use atively stable execution. We found that starting with go-
almost always the lowest energy. TX2 is almost always the ethereum v1.8.14 a transaction is started, or applied, mul-
most efficient because of its lower power profile compared tiple times and that this number is inconsistent among dif-
to NUC, and higher performance compared to RP3. We ferent runs. For example, a CPUHeavy transaction is ap-
note that even if RP3 has a very low power profile, its mem- plied as few as 16 and as many as 481 times in go-ethereum
ory and CPU limitation translate to larger energy cost than v1.8.14. This is explained by the fact that go-ethereum
systems with higher power profile. v1.8.14 underwent a significant design change. Specifically,
In summary, we make the following observation concern- whenever a miner thread receives a new block, it discards
ing Hyperledger execution. any transactions currently being executed and applies the
Observation 2. The highest energy efficiency is achieved transactions in the newly received block. In our case, there
by low-power systems with balanced performance-to-power is a single transaction, and during its execution the miners
profile, rather than systems with low power profile but also keep generating empty blocks. As a result, the probability of
low performance. receiving a block during the transaction execution increases
with the number of miners. Therefore, the transaction is in-
4.2 Ethereum terrupted and restarted many times. As we shall see in the
Figure 4 shows a super-linear increase in the execution next section, the same results hold for the cluster setting
time of go-ethereum (geth) v1.8.15 with increasing number and for non-CPU heavy workloads.
of miner threads on three systems under evaluation. Recall We note that this design works well when a newly received
that RP3 is unable to run Ethereum. To investigate the block contains updates to states currently used by the trans-
cause of high execution time with more miner threads, we action being executed. In this case, it saves time to stop
split break down execution time to three components as de- and restart the current transaction until after the new block
scribed in BLOCKBENCH [11]. We profiled geth with Go is applied. However, interrupting transactions even when
pprof and analyzing both the callgraph and the cumulative receiving empty block results in unnecessary overhead. A
execution time per routine. From our analysis, the consen- more elegant approach is to restart only transactions whose
sus starts with the call of go-ethereum/consensus/ethash. states are affected by the new block.
(*Ethash).Seal.func1, while the execution starts with the In summary, we make the following observation concern-
invocation, go-ethereum/core/vm.(*EVMInterpreter).Run. ing Ethereum execution.
The remaining time is spent at application layer and data
layer of the blockchain stack. We observed that execution Observation 3. In the latest versions of Ethereum, (i)

7
Table 3: Comparison of five CPUHeavy(1M) executions with four miner threads on NUC with different versions of Ethereum

geth Execution Time [s] Apply Transaction Count

version Run #1 Run #2 Run #3 Run #4 Run #5 Run #1 Run #2 Run #3 Run #4 Run #5
1.4.18 146.7 97.7 120.9 98.9 99.2 6 4 5 4 4
1.8.13 5.5 5.0 4.5 7.5 5.0 1 1 1 1 1
1.8.14 1369.3 2152.8 561.7 197.3 72.6 314 481 126 42 16
1.8.15 62.08 125.45 27.68 141.64 186.4 13 27 4 31 42

Table 4: Time, Power and PPR of Ethereum

Execution Time [s] Power [W]

Performance-to-Power Ratio [ops/J]
Workload Size Average Std. dev. Average Std. dev.
Xeon NUC TX2 Xeon NUC TX2 Xeon NUC TX2 Xeon NUC TX2 Xeon NUC TX2
CPUHeavy 1000000 6.2 40.6 31.4 1.4 12.3 5.5 80.9 17.1 5.0 1.4 0.2 0.1 2,112.5 1,586.8 4,111.3
100 3.5 2.9 9.2 0.8 1.4 7.7 76.7 16.4 4.8 0.1 0.5 0.0 0.5 3.1 2.3
IOHeavy Write 1000 4.4 10.5 12.6 2.5 5.0 7.2 81.4 16.1 5.1 4.0 0.2 0.3 4.1 7.5 15.5
10000 329.9 142.4 194.4 48.9 118.6 173.7 80.5 17.5 5.2 0.5 0.4 0.2 0.4 10.4 24.9
100 3.9 3.4 5.2 1.7 0.9 3.3 78.0 16.7 5.0 1.1 0.4 0.1 0.4 1.9 3.0
IOHeavy Scan 1000 1.8 6.5 29.6 1.0 2.3 10.7 77.4 16.0 4.7 3.2 0.5 0.0 12.3 10.7 7.1
10000 1.8 2.4 11.1 0.9 0.6 1.4 79.7 17.0 5.0 2.8 0.3 0.2 81.7 270.6 140.7
Analytics Q1 1000 0.7 8.1 19.5 0.0 0.0 0.1 70.4 19.4 5.5 4.1 0.1 0.1 20.4 63.9 91.9
Analytics Q2 1000 0.7 8.1 19.6 0.0 0.0 0.3 73.5 19.4 5.5 0.5 0.2 0.0 19.2 63.8 92.9
Analytics Q3 1000 0.4 6.6 16.9 0.0 0.0 0.1 71.7 19.6 5.5 0.3 0.2 0.1 32.0 76.9 105.8

execution time increases with the number of miner threads 2 times. In contrast, the execution of IOHeavy Write of
and (ii) there is high execution time variation among dif- 10, 000 key-value pairs finishes in 458s and is restarted 63
ferent runs of the same workload, especially when the work- times.
load is computation heavy or when more miner threads are We note that we were unable to run CPUHeavy with input
used. These are due to new transaction restarting mecha- size of 10M and 100M. While BLOCKBENCH paper [11]
nism which restarts execution when receiving a new block, reports execution times for Ethereum CPUHeavy on 10M
even if that block is empty. input, in our experiments the clients never finish the execu-
tion.
Time-energy performance. The time, power and PPR
of Ethereum runnign with one miner thread are shown in Ta- 4.3 Parity
ble 4. Across systems, we observe the same pattern as that The time-power results for Parity are presented in Table 5.
of Hyperledger. In other words, Observation 2 holds. Even Unlike Ethereum, Parity is able to run on the wimpy RP3
if TX2 exhibits the highest execution time in general, its system. On the other hand, all systems are not able to run
energy usage is the lowest and, thus, its PPR is the highest. the CPUHeavy workload with 100M input.
For example, the IOHeavy Scan benchmark with 10, 000 key- Recall that RP3 is 2 − 3× slower than TX2 for Hyper-
value pairs is 3.2× slower on TX2, but uses 4.9× less energy ledger. This gap is much bigger for Parity. In particular,
than Xeon. RP3 is 8× slower than TX2 when running CPUHeavy with
As expected, Ethereum uses more power than Hyper- 10M input. Our profiling using Linux perf shows that RP3
ledger. In particular, for sorting 1M values, Xeon, NUC spends significant time in libarmmem.so which is a library
and TX2 use 50.6W, 9W and 2.4W, respectively, in Hyper- for memory operations for ARM-based systems. This, to-
ledger, as opposed to 81W, 17W and 5W, respectively, in gether with a low CPU utilization of 10%, suggest that mem-
Ethereum. There are two reasons for this behavior. First, ory is the main bottleneck of Parity execution on RP3. In
Ethereum use more cryptographic operations which incur contrast, the other systems spend most of the time in the
high CPU utilization. Second, Ethereum uses EVM, an in- execution layer, i.e., inside EVM interpreter.
terpreted execution environment which is less efficient than The variability in execution time among different runs is
Hyperledger’s Docker execution. Consequently, the CPU less visible in Parity compared to Ethereum. Table 5 shows
performs more work in Ethereum. high standard deviations only for IOHeavy workloads with
Our evaluation demonstrates high variability, especially small input size. We attribute this to the memory hierar-
for IOHeavy operations, as indicated by the high standard chy, especially to CPU caches and memory buffers that need
deviation in Table 4. Execution profiling of IOHeavy Write time to warm-up and may exhibit unpredictable behavior on
shows that much of the time is spent in the EVM interpreter. shorter executions. Indeed, CPUHeavy and Analytics do not
For example, the writing of 10, 000 key-value pairs on Xeon exhibit execution time variability. The former is not mem-
spends 71% of the time inside EVM interpreter, while sort- ory or I/O intensive. The latter includes an initialization
ing one million numbers spends only 10% in the same rou- step that warms up the caches and memory buffers.
tine. The root cause is the same as for running multiple As expected, the power consumption of Parity is lower
miner threads, namely, the transaction is restarted multiple compared to Ethereum, but higher when compared to Hy-
times until it manages to finish. Transactions that perform perledger. Taking CPUHeavy workload as example, Xeon,
more work and take longer to finish, have higher chances to NUC and TX2 use 57.6W, 12.7W and 3.6W, respectively,
be restarted and, thus, take even longer to finish under geth to sort one million values in Parity. For the same amount
v1.8.15. For example, an execution of sorting one million of work, Ethereum consumes 81W, 17W, and 5W on Xeon,
numbers on Xeon finishes in 9s and restarts the transaction NUC and TX2, respectively, while Hyperledger consumes

8
Table 5: Time, Power and PPR of Parity
Execution Time [s] Power [W]
Performance-to-Power Ratio [ops/J]
Workload Size Average Std. dev. Average Std. dev.
Xeon NUC TX2 RP3 Xeon NUC TX2 RP3 Xeon NUC TX2 RP3 Xeon NUC TX2 RP3 Xeon NUC TX2 RP3
1000000 64.9 71.1 147.1 1,205.5 26.0 0.0 3.7 6.5 57.6 12.7 3.6 2.6 1.2 0.3 0.1 0.0 324.2 1,106.6 1,910.1 316.0
CPUHeavy
10000000 469.7 705.1 1,371.0 12,205.4 4.3 0.2 83.0 367.2 71.5 14.7 4.5 2.7 1.2 0.4 0.0 0.0 298.1 967.4 1,634.5 302.0
100 84.8 42.1 30.8 62.2 15.3 15.2 0.2 45.3 50.7 8.8 2.5 2.1 0.8 0.0 0.0 0.0 0.0 0.3 1.3 1.2
IOHeavy Write 1000 170.4 96.0 106.7 287.0 84.1 26.0 40.0 5.6 52.8 10.5 3.0 2.5 0.1 0.2 0.1 0.1 0.1 1.1 3.6 1.4
10000 124.7 186.9 380.7 2,996.5 0.4 5.8 1.1 25.8 71.0 14.2 4.2 2.7 1.8 0.1 0.1 0.0 1.1 3.7 6.3 1.2
100 63.1 52.6 30.7 82.5 0.0 15.2 0.0 40.0 50.0 8.8 2.5 1.9 0.8 0.1 0.0 0.0 0.0 0.3 1.3 0.9
IOHeavy Scan 1000 42.2 149.0 51.9 72.2 15.1 30.0 30.0 40.0 50.1 8.7 2.5 2.0 0.9 0.0 0.1 0.0 0.5 0.8 10.1 9.7
10000 52.6 191.7 30.3 112.7 14.9 78.3 1.2 2.8 51.0 9.5 2.8 2.5 0.9 0.1 0.1 0.0 4.1 6.7 119.6 35.5
Analytics Q1 1000 1.2 2.0 10.5 14.2 0.0 0.0 0.3 2.3 51.1 14.0 2.8 2.5 1.3 1.4 0.0 0.2 16.3 36.9 34.8 28.1
Analytics Q2 1000 1.2 1.9 10.2 14.6 0.0 0.0 0.2 1.4 49.5 14.4 2.7 2.5 1.7 1.1 0.0 0.1 17.1 36.0 35.7 27.8
Analytics Q3 1000 0.5 0.7 1.8 4.0 0.0 0.0 0.1 0.0 50.3 13.5 4.6 2.9 1.7 2.3 0.1 0.0 42.1 102.8 108.5 85.0

Table 6: Impact of storage subsystem the idle power. In contrast, the HDD adds 3.5W to the idle
power, thus increasing it by 2.5×. The HDD has more than
System
Metric
TX2+SDC TX2+SSD TX2+HDD
5× higher write throughput but similar read throughput
Idle System Power [W] 2.4 2.9 5.9 compared to TX2+SDC.
Write Throughput [MB/s] 16.3 206.0 87.6 Interestingly, Jetson with SD card exhibits slightly better
Read Throughput [MB/s] 88.9 277.0 93.4 execution time when running the IOHeavy. We attribute
Write Latency [ms] 17.1 2.8 13.7
Read Latency [ms] 2.8 1.8 1.2
this to the fact that the SD card stores the OS, libraries
IOHeavy Write (10000) and Docker containers in all three configurations, including
Hyperledger Time [s] 18.2 22.4 24.0 the TX2+SSD and TX2+HDD. Hence, the ledger storage
Parity Time [s] 380.7 382.8 386.2 subsystem is not a bottleneck, otherwise, TX2+SDC would
Hyperledger Power [W] 7.2 7.7 10.0 exhibit higher execution time due to its lower raw through-
Parity Power [W] 4.2 4.8 7.7
IOHeavy Scan (10000)
put and higher latency. In fact, our profiling of IOHeavy
Hyperledger Time [s] 28.8 31.7 29.8 Write shows that write operations are sparse, with an aver-
Parity Time [s] 30.3 32.5 43.1 age of 1MB/s and a peak of 21.5MB/s across all subsystems.
Hyperledger Power [W] 3.1 3.6 6.6 These values are within the capabilities of all storage sub-
Parity Power [W] 2.8 3.3 6.4
systems, but switching between the execution context of the
SD card and the ledger storage may induce overhead.
In terms of power, we observe that IOHeavy Scan adds
50.6W, 9W and 2.4W, respectively. This behavior can be only 0.7W and 0.4W to the idle power off all system config-
explained by the lower power overhead of Parity’s PoA con- urations for Hyperledger and Parity, respectively. IOHeavy
sensus protocol compared to Ethereum’s PoW. Parity also Write uses more power, adding between 4.1W and 4.8W for
has an interpreted EVM which is not as efficient as Hy- Hyperledger and around 1.8W for Parity. These results are
perledger’s Docker execution engine, and which draws addi- stable, in general. The only notable exception is IOHeavy
tional power. Scan in Parity on the TX2+HDD which, in general, finishes
Observation 2 holds also for Parity. In particular, Xeon in 32.5s, but in some cases it finishes in 64s or 97s.
and NUC are the fastest systems, while TX2 uses the small- In summary, we make the following observation.
est amount of energy due to its shorter execution than RP3
and lower power usage than Xeon and NUC. Observation 4. Wimpy nodes can accommodate conven-
tional storage subsystems of large capacity, therefore they
4.4 Impact of Storage Subsystem can store large ledgers. The storage subsystem type does not
In this section, we analyze the impact of different types significantly affect the I/O performance of Hyperledger and
of storage subsystems on blockchain performance using the Parity.
IOHeavy benchmarks. We select the TX2 system which
has interfaces for SD card, SATA storage and USB3 de- 4.5 Bootstrapping Performance
vices. While the baseline is the system with a 64GB SD card In this section, we analyze the performance of bootstrap-
(TX2+SDC), we separately connect a 1TB SSD through ping which is the process of joining a blockchain network
SATA (TX2+SSD) and an external 2TB HDD through and synchronizing the distributed ledger. We consider one
USB3 (TX2+HDD). The SD card stores the OS in both node that joins an existing network of seven other nodes of
TX2+SSD and TX2+HDD. the same type. Prior to the bootstrapping process, we gen-
We first measure the IO performance in terms of raw erate over 100 blocks by running the YCSB workload on the
read/write throughput and latency. Then, we measure the 8-node blockchain network. We then stop one node, delete
performance of IOHeavy benchmarks in Hyperledger and its ledger and caches, and restart it so that it synchronizes
Parity. Ethereum is not included in this analysis because the ledger with other nodes.
of its unpredictable behaviour, as discussed in Section 4.2. Hyperledger v0.6 adopts a lazy bootstrapping approach
In addition, we evaluate the impact of storage on the to- where the synchronization is started when new transactions
tal power by measuring the idle power when the hardware is are submitted. Hence, the execution time and power of syn-
running only the OS, and the active power during blockchain chronization and of transaction cannot be clearly separated.
execution. The results are summarized in Table 6. Here, we report the time taken by Hyperledger to update
In terms of raw performance, the SSD is the clear winner. its block tip to a certain value. To synchronize around 2750
It has almost 13× higher write throughput and 3× higher blocks, Hyperledger on Xeon takes 40s while using 51.25W.
read throughput than SD card, while adding only 0.5W to Interestingly, TX2 is faster than Xeon: it takes less than 20s

9
Xeon, Hyperledger TX2, Hyperledger 300 Xeon, Hyperledger TX2, Hyperledger
Xeon, Hyperledger TX2, Hyperledger
Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum
Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity
250
1000 1000

Average Power [W]

Throughput [tps]

200

Latency [s]
100 150

100
100
10
50

1 0 10
10 100 1000 10 100 1000 10 100 1000
Transaction rate [tps] Transaction rate [tps] Transaction rate [tps]

(a) Throughput (b) Latency (c) Power

Figure 5: The performance of YCSB benchmark with increasing transaction rate

while using up to 3W. We attribute this to networking setup. 5.1 Homogeneous Cluster
In particular, the Xeon cluster runs on NFS which adds some We consider Xeon-only and TX2-only clusters. The for-
overhead. We note that both systems use a relatively low mer is the faster, the latter the most energy efficient. We
power compared to their peak power. This is because the vary the cluster size from 2 to 8 nodes. The clients that is-
blocks are downloaded from the other peers without execut- sues requests run on separate nodes, and unlike the analysis
ing all transactions. in [21], they are not included in our performance evaluation.
Ethereum supports three bootstrapping modes, fast, full Our main focus is on the blockchain nodes.
and light [13]. In light mode, which is intended for wimpy
systems, only the current state is downloaded from other 5.1.1 Impact of request rate
peers. In fast and full mode, all blocks are downloaded. We first examine the throughput, latency and power us-
However, only in full mode are all the transactions applied, age with increasing request rate. We fix the cluster size
which means it is slower than fast mode. In our experiments, to 8 nodes, and use 8 to send transactions. We increase
we do not consider light mode because it is very fast on both the transaction rate from 8 to 4096 transactions per second
wimpy and brawny nodes. Ethereum takes 14.8s and 28.8s (tps). The results, depicted in Figure 5 for the YCSB bench-
to synchronize around 2000 blocks in fast and full mode mark, show that Hyperledger is able to sustain a throughput
on Xeon, respectively, while using 120W. On TX2, it takes of up to 2220 tps and 630 tps on Xeon and TX2, respectively.
57.2s and 6.7W to synchronize in full mode, and only 4s and Ethereum achieves a throughput of up to 39.7 tps and only
6W to synchronize in fast mode. 3.3 tps on Xeon and TX2, respectively. Parity achieves a
By default, Parity uses fast (or warp) synchronization maximum throughput of 30 tps and 25 tps on Xeon and
which skips “almost all of the block processing” [12]. How- TX2, respectively, when the client request rate is 512tps.
ever, we observed that synchronizing the ledger in Parity Similar patterns are observed when running Smallbank and
takes much longer than Ethereum, even when warp syncing Donothing benchmarks.
is on. In particular, synchronizing 100 blocks in Parity takes To achieve peak throughput, Hyperledger uses 618W on
over 4 hours on Xeon and over 3 hours on TX2, whereas in Xeon and only 26.4W on TX2. Parity uses even less power,
Ethereum it takes 2.6s and 2.2s on Xeon and TX2, respec- ranging between 400W and 480W on Xeon, and between
tively. This is a well-known issue in Parity4 , with some users 20W and 26W on TX2. In contrast, Ethereum uses the
blaming the I/O subsystem. But our profiling shows that most power, between 860W and 900W on Xeon and around
the peak I/O write rate is around 1MB/s which is much 49W on TX2.
lower than the available throughput of the storage system. These results can be summarized in the following obser-
Moreover, the power during Parity’s synchronization is close vation.
to the idle power: 51W and 2.4W on Xeon and TX2, respec-
tively. This shows that Parity is not doing much work during Observation 5. Higher-end wimpy nodes, such as Jet-
the synchronization process. We therefore conclude that the son TX2, achieve around one-third of Hyperledger through-
synchronization inefficiency lies in Parity’s implementation put and almost the same performance for Parity compared
rather than in the hardware. to brawny Xeon nodes, while using 18× to 23× less power.
These nodes have potential of achieving significant power
5. CLUSTER ANALYSIS and cost savings.
In this section, we analyze the time-energy performance
Standard deviation is relatively low in most of the cases,
of blockchains on a cluster. We consider both homogeneous
with the highest outliers being the latency under high re-
consisting of nodes of the same type, and heterogeneous clus-
quest rates. In particular, Hyperledger’s latency exhibits a
ter consisting of multiple types of nodes.
standard deviation of 111.5% and 42.5% on Xeon with 4096
4
For example, users report on StackExchange that syn- tps and TX2 with 1024 tps request rate, respectively. For
chronizing with the main network in 2018 took few days throughput, the maximum standard deviation is 101.1% for
(https://bit.ly/2UvIR1g) Ethereum on TX2 and 30.8% for Parity on Xeon. Power

10
Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger
Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum
Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity

10000 200 1000

Average Power [W]

Throughput [tps]

150

Latency [s]
1000
100

100 100

10
10 50

1 0 1
2 4 6 8 2 4 6 8 2 4 6 8
Nodes Nodes Nodes
(a) Throughput (b) Latency (c) Power

Figure 6: The performance of YCSB benchmark with increasing number of nodes

Throughput (geth v1.4.18)

Throughput (geth v1.8.15) 1
350 Latency (geth v1.4.18) 160
Latency (geth v1.8.15) 0.8
300 140

CDF
0.6 geth v1.4.18
120 geth v1.8.15
Throughput [tps]

250
0.4
Latency [s]

100
200 0.2
80
150 0
60 0 20 40 60 80 100 120 140 160 180 200
100 Apply Transaction Count
40
50 20 Figure 8: Comparison of apply transaction count distribu-
tion in two versions of Ethereum
0 0
10 100 1000
Transaction rate [tps]
times a transaction is applied is similar, which is 20 times,
Figure 7: Throughput and latency comparison between dif- we observed that a higher number of unique transactions are
ferent versions of Ethereum executed by geth v1.4.18 than by geth v1.8.15. These unique
transactions are reflected in the throughput. Furthermore,
transactions are restarted more often in geth v1.8.15, as
consumptions have low standard deviation: below 1% on shown in Figure 8. The maximum number of restarts in
Xeon and 4.5% on TX2. v1.8.15 is much higher than v1.4.18, namely 183 times ver-
Ethereum execution on TX2 is irregular, as shown in Fig- sus 105 times.
ure 5, and has higher standard deviation compared to the
other two blockchains. Moreover, Ethereum throughput 5.1.2 Impact of network size
is much lower and its latency is higher when using ver- Next, we examine the scalability with increasing number
sion v1.8.15 compared to v1.4.18 evaluated in BLOCK- of blockchain nodes and clients. We use the same number
BENCH [11]. As shown in Figure 7 for YCSB, v1.4.18 of clients as the number of nodes. We choose a request
achieves a maximum of 284.4 tps for a transaction request rate that saturates the systems, as identified in the previous
rate of 1024 tps, while v1.8.15 achieves only 39.7 tps. The section. In particular, for Xeon we set the rate per client
increase in latency is relatively smaller, with maximum la- node to 512, 8 and 64 tps for Hyperledger, Ethereum and
tencies of 137 and 154 seconds for v1.4.18 and v1.8.15, re- Parity, respectively. On TX2, we set the rate per client to
spectively. 128, 4 and 64 tps.
We note that the higher throughput reported for v1.4.18 Figure 6 shows the throughput for YCSB with increasing
is attributed to (i) different parameter settings, and more number of nodes. We attribute the fluctuations of Ethe-
fundamentally to (ii) a design change in Ethereum. First, reum on TX2 to the non-deterministic transaction restart-
there are changes in gas values in the newer versions. This ing mechanism. The lower throughput, when compared to
requires to increase the gas value to 0x10000 in order to Xeon, is due to the compute-intensive PoW consensus proto-
run YCSB benchmark on v1.8.15. Second, a transaction col. In fact, the power usage of Ethereum is 2× higher than
may be restarted multiple times in v1.8.15, as discussed in Hyperledger and Parity on TX2. Specifically, 6 TX2 nodes
Section 4.2. use, on average, 37.9W, 19.8W and 17.4W when running
To understand the second factor contributing to the low Ethereum, Hyperledger and Parity, respectively.
throughput, we profile the code to record the number of The latency of Ethereum increases significantly on TX2,
times an unique transaction, as represented by its hash, is from 46.7s on 2 nodes to 195.6s on 8 nodes. This is 4.5×
restarted, or applied. Even though the average number of higher than the latency on 8 Xeon nodes. On the other

11
1600 320 25 250
Throughput Throughput
1400 Power 280 Power
20 200
Throughput [tps]

Throughput [tps]
1200 240

Power [W]

Power [W]
1000 200 15 150
800 160
600 120 10 100
400 80
5 50
200 40
0 0 0 0
Xeon Xeon+TX2 TX2 TX2+RP3 Xeon Xeon+TX2 TX2 TX2+RP3
(a) Hyperledger (b) Parity

Figure 9: The performance of YCSB on heterogeneous clusters with 4 nodes

hand, Parity’s latency decreases with the number of nodes: We found that higher-end wimpy nodes achieve reasonable
from 87.4s on 2 nodes to 46.7s on 8 nodes. In summary, performance with significantly lower energy than brawny
Ethereum is virtually unusable on wimpy systems due to (i) nodes. In particular, a Jetson TX2 cluster with eight nodes
low throughput and high latency caused by PoW consensus, achieves more than 80% and almost 30% of Parity and Hy-
and (ii) unstable performance due to transaction restarting. perledger throughput, respectively, while using 18× and 23×
less power, respectively, than an 8-node Xeon cluster.
5.2 Heterogeneous Cluster We also found that wimpy nodes with well-balanced PPR
In this section we examine the effects of heterogeneous achieve higher energy efficiency compared to extremely low-
nodes on the overall blockchain performance. The baselines power nodes. For example, a TX2 is more energy efficient
of homogeneous clusters are represented by (i) 4 Xeon nodes than a Raspberry Pi 3, even though the former has an idle
and (ii) 4 TX2 nodes. From the homogeneous Xeon cluster, power of 2.4W and a peak power of more than 10W, while
we replace two nodes with TX2 (Xeon+TX2); from the the latter has 2W and 5W, respectively. The better energy
homogeneous TX2 cluster we replace two nodes with RP3 efficiency of TX2 compared to RP3 is due to its higher per-
(TX2+RP3). We run the distributed benchmarks for Hy- formance while keeping a low power profile at subsystem
perledger and Parity. Ethereum is left out because it cannot level, including the CPU, memory and storage.
be run on RP3. Finally, we found that recent versions of Ethereum suffer
As shown in Figure 9 for the peak throughput of YCSB, from low and unstable performance. It is due to the trans-
the performance degrades when lower-performance nodes action restarting mechanism that stops and discards trans-
are introduced. But the power consumption improves be- action execution whenever new blocks are received, even if
cause the heterogeneous cluster uses less power. In partic- those blocks are empty. This fact, together with the high
ular, Xeon+TX2 has a performance drop of 35% but uses cost of the PoW consensus protocol, make Ethereum unus-
53% less power than the homogeneous Xeon cluster, when able on wimpy nodes.
running Hyperledger. The results are better for Parity,
where a 43% power savings causes only 10% loss of through- 7. REFERENCES
put. However, adding RP3 nodes to a TX2 cluster does [1] Raspberry Pi 3 Model B. http://bit.ly/1WTq1N4,
not yield satisfactory results. For Hyperledger, the through- 2016.
put drops 62%, while the power decreases only slightly from [2] Intel NUC Kit NUC7i3BNH.
13.4W to 11.8W (or only 12% power savings). For Parity, http://www.webcitation.org/74GPLkSah, 2017.
the power consumption of the heterogeneous cluster is even [3] Ubuntu Docs - What is swappiness and how do I
higher than the homogeneous cluster, 12.8W versus 12W, change it? http://www.webcitation.org/76QeVALC9,
while the throughput drops from 15.3 tps to 11.5 tps. 2018.
Similar to the analysis of homogeneous clusters, the re- [4] ARM. ARM Announces Support For EEMBC
sults here demonstrate that higher-end wimpy nodes have CoreMark Benchmark.
the potential of reducing power usage, while achieving rea-
http://www.webcitation.org/6RPwNECop, 2009.
sonable performance. However, heterogeneous clusters with
[5] L. A. Barroso, J. Clidaras, and U. Hoelzle. The
wimpy nodes may not always achieve the best PPR. More
Datacenter as a Computer: An Introduction to the
specifically, if the performance gap between different types
Design of Warehouse-Scale Machines, Second Edition.
of nodes is too large, the low-power profile of the wimpy
Morgan and Claypool Publishers, 1st edition, 2013.
nodes does not lead to better energy efficiency due to lower
throughput and increasing latency. [6] T. Blummer. An Introduction to Hyperledger. 2018.
[7] V. Buterin. A Next-Generation Smart Contract and
Decentralized Application Platform. 2013.
6. CONCLUSIONS [8] M. Castro and B. Liskov. Practical byzantine fault
In this paper, we performed an extensive time-energy anal- tolerance. In Proceedings of the Third Symposium on
ysis of representative blockchain workloads on low-power, Operating Systems Design and Implementation, OSDI
wimpy nodes in comparison with traditional brawny nodes. ’99, pages 173–186, Berkeley, CA, USA, 1999.
The wimpy nodes used in our analysis cover the low-end USENIX Association.
and high-end performance spectrum, and both x86/64 and [9] Digiconomist. Bitcoin Energy Consumption Index.
ARM architectures. http://www.webcitation.org/74GL5jBxg, 2018.

12
[10] Digiconomist. Ethereum Energy Consumption Index S. Thajchayapong. Performance Analysis of Private
(beta). http://www.webcitation.org/74GLngHMZ, Blockchain Platforms in Varying Workloads. In Proc.
2018. of 26th International Conference on Computer
[11] T. T. A. Dinh, J. Wang, G. Chen, R. Liu, B. C. Ooi, Communication and Networks, pages 1–6, 2017.
and K.-L. Tan. BLOCKBENCH: A Framework for [20] N. Rajovic, L. Vilanova, C. Villavieja, N. Puzovic, and
Analyzing Private Blockchains. In Proc. of 2017 ACM A. Ramirez. The Low Power Architecture Approach
International Conference on Management of Data, Towards Exascale Computing. Journal of
pages 1085–1100, 2017. Computational Science, 4(6):439–443, 2013.
[12] P. T. Documentation. Getting Synced. [21] S. Sankaran, S. Sanju, and K. Achuthan. Towards
https://wiki.parity.io/Getting-Synced, 2019. Realistic Energy Profiling of Blockchains for Securing
[13] EtherWorld. Understanding Ethereum Light Node. Internet of Things. In Proc. of 38th IEEE
http://www.webcitation.org/77lSRvuey, 2018. International Conference on Distributed Computing
[14] D. Franklin. NVIDIA Jetson TX2 Delivers Twice the Systems, pages 1454–1459, 2018.
Intelligence to the Edge. [22] C. Staelin and L. McVoy. lmbench - system
http://www.webcitation.org/73M0i1pIf, 2017. benchmarks. http://www.webcitation.org/74EthsKEa,
[15] V. Gupta and K. Schwan. Brawny vs. Wimpy: 2018.
Evaluation and Analysis of Modern Workloads on [23] K. Suankaewmanee, D. T. Hoang, D. Niyato,
Heterogeneous Processors. In Proc. of 27th S. Sawadsitang, P. Wang, and Z. Han. Performance
International Symposium on Parallel and Distributed Analysis and Application of Mobile Blockchain. In
Processing Workshops and PhD Forum, pages 74–83, Proc. of International Conference on Computing,
2013. Networking and Communications, pages 642–646,
[16] S. Han, Z. Xu, and L. Chen. Jupiter: A Blockchain 2018.
Platform for Mobile Devices. In Proc. of 34th IEEE [24] B. M. Tudor, Y. M. Teo, and S. See. Understanding
International Conference on Data Engineering Off-Chip Memory Contention of Parallel Programs in
(ICDE), pages 1649–1652, 2018. Multicore Systems. In Proc. of International
[17] S. Nakamoto. Bitcoin: A Peer-to-peer Electronic Cash Conference on Parallel Processing, pages 602–611,
System. 2008. 2011.
[18] Parity.io. Blockchain Infrastructure for the [25] L. van Doorn. Enabling Cloud Workloads Through
Decentralised Web. 2018. Innovations in Silicon.
[19] S. Pongnumkul, C. Siripanpornchana, and http://www.webcitation.org/6t33R0NZg, 2017.

13
400
W/O Explicit GC Invocation, Swappiness=60
W/O Explicit GC Invocation, Swappiness=10
With Explicit GC Invocation
APPENDIX
350
A. ADDITIONAL RESULTS
300
Hyperledger on RP3. Figure 10 compares three dif-
Memory Used [MB]
250 ferent runs of Hyperledger with and without explicitly call-
200 ing Go’s garbage collector on RP3. Figure 10a represents
the same execution plotted in detail in Figure 3. In almost
150
all cases, Hyperledger with explicit GC invocation uses less
100 memory and is as fast, if not faster, than Hyperledger with-
50
out explicit GC invocation. On the one hand, the GC incurs
CPUHeavy Deploy IOHeavy Deploy CPUHeavy
more mmap/munmap system calls. On average across our ex-
0
0 50 100 150 200 250 300
periments, Hyperledger with explicit GC incurs 70 mmap and
Timeline [s] 4 munmap calls, respectively, while Hyperledger without GC
invocation incurs 50 mmap and 2 munmap calls, respectively.
(a) Run 1
400 Ethereum. Figure 11 compares the execution time and
W/O Explicit GC Invocation, Swappiness=60
W/O Explicit GC Invocation, Swappiness=10
With Explicit GC Invocation
power usage of different runs of the same CPUHeavy work-
350
load on Ethereum with four miner threads, when running
300 on the NUC node. We observe significant execution time
differences, while the power is roughly constant at around
Memory Used [MB]

250
23W. Compared to the idle power of 9W and the CoreMark
200 power of 18.6W, Ethereum’s power usage is higher, suggest-
150
ing that the system is doing heavy work not only at CPU
level, but also at memory and I/O.
100 Table 7 compares the number of times transactions are
50 CPUHeavy Deploy IOHeavy Deploy CPUHeavy
re-started (applied) in two versions of Ethereum on a clus-
ter setup with varying request rate. We present the min-
0
0 50 100 150 200 250 300 imum, maximum and average times, with standard devia-
Timeline [s] tion, across all unique transactions. We also show how many
(b) Run 2 unique transactions are executed and the total number of
400
W/O Explicit GC Invocation, Swappiness=60
times ApplyTransaction() method is called.
W/O Explicit GC Invocation, Swappiness=10
With Explicit GC Invocation
Single-node Time-Power-Energy. Execution time,
350
power usage and total energy of CPUHeavy and IOHeavy
300 workloads are plotted in Figures 12, 13 and 14 for Hyper-
Memory Used [MB]

250
ledger, Ethereum and Parity, respectively.
Cluster Performance. The throughput, latency and
200 power usage of Smallbank and Donothing workloads at clus-
150 ter level are plotted in Figures 15, 16, 17, 18, 19 and 20. Fig-
ures 15 and 16 reflect the performance with varying trans-
100
action request rate for Smallbank and Donothing, respec-
50 CPUHeavy Deploy IOHeavy Deploy CPUHeavy
tively. Figures 17, 18 show the performance on increasing
number of nodes. Figures 19 and 20 show the performance
0
0 50 100 150 200 of Smallbank and Donothing, respectively, on heterogeneous
Timeline [s] clusters, as discussed in Section 5.2.
(c) Run 3

Figure 10: Hyperledger memory usage on RP3 (multiple

runs) Table 7: Comparison of YCSB transaction (re-)execution
count between two versions of Ethereum
3000 Time 30
Apply Txn Request Rate [tps]
2500 Power 25 geth Txn
Power [W]

Count 64 128 256 512 1024 2048

Time [s]

2000 20 Min 8 1 1 1 1 1
1500 15 Max 71 59 92 106 105 143
Average 29.7 27.8 22.2 19.1 14.9 13.0
1000 10 v1.4.18
Std.dev. 9.8 7.6 14.4 14.6 14.3 11.8
Unique 19,531 38,832 74,535 122,500 125,093 126,455
500 5 Total 580,507 1,080,921 1,653,870 2,342,661 1,863,023 1,641,687
0 0 Min 1 1 1 1 1 1
1 2 3 4 5 Max 528 385 497 246 183 114
Average 21.0 18.4 20.8 18.8 19.4 21.1
Run v1.8.15
Std.dev. 17.4 14.9 16.3 14.5 14.5 18.4
Unique 10,935 12,289 10,713 10,957 10,826 10,725
Figure 11: Difference in Ethereum CPUHeavy execution Total 229,754 226,488 222,503 206,357 209,560 226,455

time among different runs

14
63891.3
Xeon 34274.7

NUC 17.7 27246.0 28957.0 17296.3

25049.0

TX2 8915.3
10960.3

10 RP3
8.3
10000
Time [s]
14675.3 6191.7 5768.3
11911.7
3.9
2.8 2871.4 2966.1
2.4
1.7
2.5
10000 7365.3 7025.0
1442.0 1487.1
1.1 1.2 1.2
1.0 1.0 4299.0

1 3473.2
1000 744.1

2125.1
1721.1

1055.2

1000
83.1 83.0 83.0 83.7 83.6 83.6
100 50.6 52.0
72.9
100 100
Power [W]

17.9 17.9 17.9

14.8 15.1 15.1 15.1
10.9
9.0
10 10 10
4.6 4.5
3.8 3.7 3.7 3.6
3.3 3.1 3.0 3.1 3.2 3.2 3.0 3.2
2.7 2.9
2.4
2.1

1 1 1
356701.8
206.1
247970.9
Energy [J]

176362.8 193297.4
100 51.4
60.4 58.5
50.7 126014.8 124318.4
38.4 106795.5 111009.4
87662.7 85440.6 86898.0
100000 62201.5
65819.5 100000 62252.2
55836.2
76359.0

13.6
9.4 43408.5
38892.3 37784.2
10 5.0
6.4 6.7 30750.2 27299.8 28470.9
21749.4
18903.8
2.6

1
1M 10M 100M 3.2M 6.4M 12.8M 3.2M 6.4M 12.8M
Input Size Input Size Input Size

(a) CPUHeavy (b) IOHeavy Write (c) IOHeavy Scan

Figure 12: The time-power-energy of Hyperlegder

Xeon 49.4 49.1

1000 331.3 1000
NUC 144.4
197.5

TX2
Time [s]

10.0 100 100 32.8

10 12.4 12.6
15.8 14.3
8.4 8.5
5.7 5.4
4.9 4.9 4.6
10 10 2.8
4.5 4.5

1 1
81.4 80.5 77.9 78.5 78.6
100 80.9
100 76.7
100
Power [W]

17.1 16.4 16.1 17.5 16.7 16.0 17.0

10 5.0
10 4.8 5.1 5.2
10 5.0 4.7 5.0

1 1 1
100000 26656.5 100000
1000 811.5 849.4
Energy [J]

243.3 10000 2488.3

10000
1002.0
100
1000 336.9
457.8

201.6
1000 220.9
360.6 348.2

155.7
135.4
80.4 78.9 89.3 76.0 71.4
10 100
59.4
100 41.8

1 10 10
1M 100 1000 10000 100 1000 10000
Input Size Input Size Input Size

(a) CPUHeavy (b) IOHeavy Write (c) IOHeavy Scan

Figure 13: The time-power-energy of Ethereum (with one miner thread)

Xeon 2999.2
NUC 12267.3

TX2 192.3

RP3 1000
Time [s]

149.5
115.3
1245.7 1421.4
289.6 85.2
1000 537.0
744.3
171.1 187.5 100 63.8
53.2 53.3
74.9

53.3
125.4
262.9 96.6 108.0 43.0

102.2 103.1
100 85.5
64.8 32.1

100 42.7
32.2

100 57.6
71.5
100 50.7 52.8
71.0
100 50.0 50.1 51.0
Power [W]

14.7 14.2
12.7
10.5
8.8 8.8 8.7 9.5

10 10 10
4.5
3.6
3.0
2.6 2.7 2.5 2.5 2.7 2.5 2.5
2.4
2.1 1.9 2.0

1 38337.3 33310.2
1 1
3193.6
2706.6
10910.4 2160.5
10000 5846.4
3269.6
6366.6
10000 9021.3 8904.1 8104.4 1298.6
1832.6
Energy [J]

4330.5
1311.2 2671.6 1000
1000 934.0

1013.3
465.8
288.5
1000 734.6
163.9 149.0
100 376.1 321.8 130.4

138.5 100 80.7

10 100 80.4

1
1M 10M 100 1000 10000 100 1000 10000
Input Size Input Size Input Size

(a) CPUHeavy (b) IOHeavy Write (c) IOHeavy Scan

Figure 14: The time-power-energy of Parity

15
300
Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger
Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum
Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity
250
1000 1000

Average Power [W]

200
Throughput [tps]

Latency [s]
100 150

100
100
10

1 0 10
10 100 1000 10 100 1000 10 100 1000
Transaction rate [tps] Transaction rate [tps] Transaction rate [tps]

(a) Throughput (b) Latency (c) Power

Figure 15: The performance of Smallbank benchmark with increasing transaction rate

300
Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger
Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum
Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity
250
1000 1000

Average Power [W]

200
Throughput [tps]

Latency [s]

100 150

100
100
10

1 0 10
10 100 1000 10 100 1000 10 100 1000
Transaction rate [tps] Transaction rate [tps] Transaction rate [tps]

(a) Throughput (b) Latency (c) Power

Figure 16: The performance of Donothing benchmark with increasing transaction rate

Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger
Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum
Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity

10000 200 1000

Average Power [W]
Throughput [tps]

1000 150
Latency [s]

100

100 100

10
10 50

1 0 1
2 4 6 8 2 4 6 8 2 4 6 8
Nodes Nodes Nodes
(a) Throughput (b) Latency (c) Power

Figure 17: The performance of Smallbank benchmark with increasing number of nodes

16
Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger Xeon, Hyperledger TX2, Hyperledger
Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum Xeon, Ethereum TX2, Ethereum
Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity Xeon, Parity TX2, Parity

10000 200 1000

Average Power [W]

Throughput [tps]

150

Latency [s]
1000
100

100 100

10
10 50

1 0 1
2 4 6 8 2 4 6 8 2 4 6 8
Nodes Nodes Nodes
(a) Throughput (b) Latency (c) Power

Figure 18: The performance of Donothing benchmark with increasing number of nodes

1600 320 25 250

Throughput Throughput
1400 Power 280 Power
20 200
Throughput [tps]

Throughput [tps]

1200 240
Power [W]

Power [W]
1000 200 15 150
800 160
600 120 10 100
400 80
5 50
200 40
0 0 0 0
Xeon Xeon+TX2 TX2 TX2+RP3 Xeon Xeon+TX2 TX2 TX2+RP3
(a) Hyperledger (b) Parity

Figure 19: The performance of Smallbank on heterogeneous clusters with 4 nodes

1600 320 100 250

Throughput Throughput
1400 Power 280 Power
80 200
Throughput [tps]

Throughput [tps]

1200 240
Power [W]

Power [W]

1000 200 60 150

800 160
600 120 40 100
400 80
20 50
200 40
0 0 0 0
Xeon Xeon+TX2 TX2 TX2+RP3 Xeon Xeon+TX2 TX2 TX2+RP3
(a) Hyperledger (b) Parity

Figure 20: The performance of Donothing on heterogeneous clusters with 4 nodes

(Ebook PDF) Developmental Science: An Advanced Textbook 7th Edition PDF Download
100% (1)
(Ebook PDF) Developmental Science: An Advanced Textbook 7th Edition PDF Download
49 pages
Energy Efficient Resource Allocation in Heterogeny
No ratings yet
Energy Efficient Resource Allocation in Heterogeny
26 pages
A Survey of Energy Consumption Measurement in Embedded Systems
No ratings yet
A Survey of Energy Consumption Measurement in Embedded Systems
15 pages
Sensors 21 00305 v2
No ratings yet
Sensors 21 00305 v2
21 pages
Bitcoin, Blockchain, and The Energy Sector: August 9, 2019
No ratings yet
Bitcoin, Blockchain, and The Energy Sector: August 9, 2019
30 pages
Researchpaper
No ratings yet
Researchpaper
20 pages
Lec 10 - Blockchain Vs DLT Variants
No ratings yet
Lec 10 - Blockchain Vs DLT Variants
58 pages
Blockchain Applications, Opportunities, Challenges and Risks A Survey
No ratings yet
Blockchain Applications, Opportunities, Challenges and Risks A Survey
7 pages
Impact of Proof of Work (PoW) - Based Blockchain Applications On The Environment
No ratings yet
Impact of Proof of Work (PoW) - Based Blockchain Applications On The Environment
29 pages
Who Are You Really Punishing
No ratings yet
Who Are You Really Punishing
31 pages
Bitcoin Cryptopayments Energy Efficiency
No ratings yet
Bitcoin Cryptopayments Energy Efficiency
27 pages
Green Nfts A Study On The Environmental Impact of Cryptoart 2whl6daq
No ratings yet
Green Nfts A Study On The Environmental Impact of Cryptoart 2whl6daq
21 pages
21itv406 PT2 QB
No ratings yet
21itv406 PT2 QB
60 pages
SSRN Id4125499
No ratings yet
SSRN Id4125499
27 pages
Blockchainzen: Towards A Greener Blockchain, Energy Computational Insights
No ratings yet
Blockchainzen: Towards A Greener Blockchain, Energy Computational Insights
21 pages
Itu-T: Technical Specification
No ratings yet
Itu-T: Technical Specification
30 pages
NERSC Perlmutter
No ratings yet
NERSC Perlmutter
20 pages
Building Energy Efficient Semantic Segmentation in Intelligent Edge Computing
No ratings yet
Building Energy Efficient Semantic Segmentation in Intelligent Edge Computing
11 pages
Energy Consumption For Edge Computing in Internet of Things
No ratings yet
Energy Consumption For Edge Computing in Internet of Things
12 pages
Block Chain
No ratings yet
Block Chain
4 pages
Accelerating Blockchain Applications On IoT Architecture Models - Solutions and Drawbacks
No ratings yet
Accelerating Blockchain Applications On IoT Architecture Models - Solutions and Drawbacks
24 pages
1 s2.0 S2096720923000106 Main
No ratings yet
1 s2.0 S2096720923000106 Main
34 pages
Blockchain Technology - Lecture 12
No ratings yet
Blockchain Technology - Lecture 12
37 pages
Sensors 22 09057 v2
No ratings yet
Sensors 22 09057 v2
20 pages
Blockchain-Based Secure Energy Trading With Mutual Verifiable Fairness in A Smart Community
No ratings yet
Blockchain-Based Secure Energy Trading With Mutual Verifiable Fairness in A Smart Community
11 pages
1 s2.0 S266679242300015X Main
No ratings yet
1 s2.0 S266679242300015X Main
7 pages
Direct Acyclic Graph-Based Ledger For Internet of Things
No ratings yet
Direct Acyclic Graph-Based Ledger For Internet of Things
14 pages
How Much Energy Does Bitcoin Actually Consume
No ratings yet
How Much Energy Does Bitcoin Actually Consume
7 pages
1.3 Energy-Aware Computing
No ratings yet
1.3 Energy-Aware Computing
13 pages
Wjarr 2024 1026
No ratings yet
Wjarr 2024 1026
12 pages
IIM Bangalore SCMC
No ratings yet
IIM Bangalore SCMC
29 pages
The Energy Consumption of Proof-of-Stake Systems: Replication and Expansion
No ratings yet
The Energy Consumption of Proof-of-Stake Systems: Replication and Expansion
13 pages
CCRI Whitepaper PoS Methods 2023
No ratings yet
CCRI Whitepaper PoS Methods 2023
13 pages
PIIS2542435122001805
No ratings yet
PIIS2542435122001805
5 pages
BSVB EnergyEfficiency Ebook
No ratings yet
BSVB EnergyEfficiency Ebook
17 pages
The Energy Footprint of Blockchain Consensus Mechanisms Beyond Proof-of-Work
No ratings yet
The Energy Footprint of Blockchain Consensus Mechanisms Beyond Proof-of-Work
10 pages
494-Article Text-1970-1-4-20201110
No ratings yet
494-Article Text-1970-1-4-20201110
27 pages
Energy Consumption - Bitcoin'S Achilles Heel
No ratings yet
Energy Consumption - Bitcoin'S Achilles Heel
10 pages
How Blockchain Can Power Sustainable Development
No ratings yet
How Blockchain Can Power Sustainable Development
5 pages
Evaluating The Efficiency of Blockchains in Iot With Simulations
No ratings yet
Evaluating The Efficiency of Blockchains in Iot With Simulations
8 pages
Web Security Audit Basics
No ratings yet
Web Security Audit Basics
20 pages
Information Technology s7 & s8
No ratings yet
Information Technology s7 & s8
317 pages
Tackling Climate Change With Blockchain
No ratings yet
Tackling Climate Change With Blockchain
2 pages
Energy Harvesting Meets IoT Fuelling Adoption Oftransient Computing in Embedded Systems
No ratings yet
Energy Harvesting Meets IoT Fuelling Adoption Oftransient Computing in Embedded Systems
5 pages
The Energy Consumption of Blockchain Technology: Beyond Myth
No ratings yet
The Energy Consumption of Blockchain Technology: Beyond Myth
10 pages
Research Paperv 02
No ratings yet
Research Paperv 02
6 pages
MAP 1st Grade - Math - Test 1 - Proficient - Testing Mom
No ratings yet
MAP 1st Grade - Math - Test 1 - Proficient - Testing Mom
14 pages
Decentralized Energy Trading
No ratings yet
Decentralized Energy Trading
5 pages
Project Based Learning - Iii: Blockchain-Zen
No ratings yet
Project Based Learning - Iii: Blockchain-Zen
11 pages
Blockchain For Power Systems - Current Trends and Future Applications PDF
No ratings yet
Blockchain For Power Systems - Current Trends and Future Applications PDF
17 pages
GreenEdge - Joint Green Energy Scheduling and Dynamic Task Offloading in Multi - Tier Edge Computing Systems
No ratings yet
GreenEdge - Joint Green Energy Scheduling and Dynamic Task Offloading in Multi - Tier Edge Computing Systems
14 pages
Going Green
100% (1)
Going Green
18 pages
Author Carson You PBP 5
No ratings yet
Author Carson You PBP 5
9 pages
Blockchain Technology Has Revolutionised Many Industries
No ratings yet
Blockchain Technology Has Revolutionised Many Industries
3 pages
Dinner in 30 - Over 100 Recipies
100% (4)
Dinner in 30 - Over 100 Recipies
123 pages
A Blockchain Definition To Clarify Its Role For IoT
No ratings yet
A Blockchain Definition To Clarify Its Role For IoT
8 pages
The Return To Confucius?: Robert Bruce
No ratings yet
The Return To Confucius?: Robert Bruce
3 pages
XRPL Sustainability Methodology 2020
No ratings yet
XRPL Sustainability Methodology 2020
6 pages
Non Text Magic Studio Magic Design For Presentations L&P
No ratings yet
Non Text Magic Studio Magic Design For Presentations L&P
10 pages
Solomonick - About The Meanings of Words
No ratings yet
Solomonick - About The Meanings of Words
13 pages
File 24
No ratings yet
File 24
6 pages
Umar Ibrahim Diyaware
No ratings yet
Umar Ibrahim Diyaware
5 pages
SY 2021 2022 Nominees Information Sheet
No ratings yet
SY 2021 2022 Nominees Information Sheet
1 page
UNIT V Challanges Usecases
No ratings yet
UNIT V Challanges Usecases
10 pages
Lesson 6 Designing and Managing Service
100% (1)
Lesson 6 Designing and Managing Service
15 pages
Draft Dan Mean Draft
No ratings yet
Draft Dan Mean Draft
10 pages
AS Business Notes
81% (27)
AS Business Notes
119 pages
Energy-Aware Optimal Task Assignment For Mobile Heterogeneous Embedded Systems in Cloud Computing
No ratings yet
Energy-Aware Optimal Task Assignment For Mobile Heterogeneous Embedded Systems in Cloud Computing
6 pages
Bitcoin Energy Efficiency
No ratings yet
Bitcoin Energy Efficiency
28 pages
Topic:Role of L.M Singhvi Committee
No ratings yet
Topic:Role of L.M Singhvi Committee
10 pages
C.coek - Info - All Ceramics at A Glance 3rd English Ed
No ratings yet
C.coek - Info - All Ceramics at A Glance 3rd English Ed
2 pages
General Conversation Practice Questions and Support: French Gcse
No ratings yet
General Conversation Practice Questions and Support: French Gcse
4 pages
Gear Units and Gearmotor Bonfiglioli PDF
No ratings yet
Gear Units and Gearmotor Bonfiglioli PDF
70 pages
Module 2 Course Work Comparative Education Bbuc
100% (1)
Module 2 Course Work Comparative Education Bbuc
3 pages
Worksheet - Faisal The Little Arab Boy
No ratings yet
Worksheet - Faisal The Little Arab Boy
2 pages
Importance of Planning
No ratings yet
Importance of Planning
6 pages
ZX650 Basic Health Insp - THCM
No ratings yet
ZX650 Basic Health Insp - THCM
1 page
Super Critical Fluid Extraction
No ratings yet
Super Critical Fluid Extraction
42 pages
Study On The Instrument Panel Assembly Modal Analysis Basic On CAE Technology
No ratings yet
Study On The Instrument Panel Assembly Modal Analysis Basic On CAE Technology
6 pages
4500 VSI Operation Manual PDF
50% (2)
4500 VSI Operation Manual PDF
114 pages
S T Y H M: Efer Oldot Eshua A Ashiach
No ratings yet
S T Y H M: Efer Oldot Eshua A Ashiach
18 pages
Roland Berger Crypto Mining and The Energy Industry
No ratings yet
Roland Berger Crypto Mining and The Energy Industry
9 pages
Bow Mapeh 7 3RD Quarter
No ratings yet
Bow Mapeh 7 3RD Quarter
2 pages
Crypto Mining G2
No ratings yet
Crypto Mining G2
3 pages
Danson - Brosura Aparat Ventilatie HEYER iTernIS BASE
No ratings yet
Danson - Brosura Aparat Ventilatie HEYER iTernIS BASE
4 pages
Number Recognition 1 10 2
100% (1)
Number Recognition 1 10 2
14 pages
Aapex 2013 - Exhibitor Search Filters International Shipping
No ratings yet
Aapex 2013 - Exhibitor Search Filters International Shipping
3 pages
Primary Colors
No ratings yet
Primary Colors
2 pages
Unraveling the Enigma of Blockchain: The Revolutionary Technology Powering the Future
From Everand
Unraveling the Enigma of Blockchain: The Revolutionary Technology Powering the Future
Umut Saray
No ratings yet
Innovation Landscape brief: Blockchain
From Everand
Innovation Landscape brief: Blockchain
International Renewable Energy Agency (IRENA)
No ratings yet

Blockchain Goes Green?

Uploaded by

Blockchain Goes Green?

Uploaded by

Blockchain Goes Green?

ABSTRACT mining phase running cryptographic algorithms. In addi-

Xeon NUC TX2 RP3

Figure 1: Performance and power at CPU level

32768 Observation 1.1. Low-power x86/64 devices, such as In-

Memory Used [MB]

Memory Used [MB]

on RP3. For example, we decreased cNameArr buffer size 300

Memory Used [MB]

7000 Consensus Execution Others 20000

(a) On Xeon (b) On NUC (c) On TX2

Figure 4: Ethereum execution breakdown

geth Execution Time [s] Apply Transaction Count

Table 4: Time, Power and PPR of Ethereum

Execution Time [s] Power [W]

Average Power [W]

(a) Throughput (b) Latency (c) Power

Figure 5: The performance of YCSB benchmark with increasing transaction rate

10000 200 1000

Average Power [W]

Figure 6: The performance of YCSB benchmark with increasing number of nodes

Throughput (geth v1.4.18)

Figure 9: The performance of YCSB on heterogeneous clusters with 4 nodes

Figure 10: Hyperledger memory usage on RP3 (multiple

Count 64 128 256 512 1024 2048

time among different runs

NUC 17.7 27246.0 28957.0 17296.3

17.9 17.9 17.9

(a) CPUHeavy (b) IOHeavy Write (c) IOHeavy Scan

Figure 12: The time-power-energy of Hyperlegder

Xeon 49.4 49.1

10.0 100 100 32.8

17.1 16.4 16.1 17.5 16.7 16.0 17.0

243.3 10000 2488.3

(a) CPUHeavy (b) IOHeavy Write (c) IOHeavy Scan

Figure 13: The time-power-energy of Ethereum (with one miner thread)

138.5 100 80.7

(a) CPUHeavy (b) IOHeavy Write (c) IOHeavy Scan

Figure 14: The time-power-energy of Parity

Average Power [W]

(a) Throughput (b) Latency (c) Power

Average Power [W]

(a) Throughput (b) Latency (c) Power

10000 200 1000

10000 200 1000

Average Power [W]

1600 320 25 250

Figure 19: The performance of Smallbank on heterogeneous clusters with 4 nodes

1600 320 100 250

1000 200 60 150

Figure 20: The performance of Donothing on heterogeneous clusters with 4 nodes

You might also like