Concurrent Programming Case Study_ S3 Metadata Requests _ by Joshua Robinson _ Medium
Concurrent Programming Case Study_ S3 Metadata Requests _ by Joshua Robinson _ Medium
Listen Share
Recently a FlashBlade customer had a challenge with listing all custom metadata on
objects in a large bucket. A standard S3 LIST request does not return custom
metadata, therefore the task requires also issuing a HEAD request for each object.
With millions of objects in a bucket, the naive Python approach was serialized and
therefore painfully slow. The FlashBlade backend is an all-flash object store
designed for high throughput and metadata performance, so the question naturally
became how to better utilize the storage backend to complete the requests faster?
Writing performant and correct asynchronous code is hard. Inevitably, many threads
need to concurrently access and modify shared state, leading to lots of nasty bugs.
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 1/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
While Python is the easiest language to work in general, it’s subtle and difficult to
write performant, parallelized Python code. In contrast, Go and Rust make writing
concurrent code relatively easier. In fact, writing concurrent Go code is no harder
than writing single-threaded Go code, which is unsurprising given the language’s
design goals. Writing concurrent Rust is more challenging but correctness is mostly
baked-in, unlike the equivalent C++ code (unless coroutines eventually save the
day!). Go and Rust both have significantly better writability/performance tradeoffs
than Python.
Results
Go and Rust are faster because they are modern compiled languages, whereas
Python is an interpreted language with limited concurrency due to the use of a
global interpreter lock (GIL). The best parallelized Python result was still 3x and 4x
slower than the Go and Rust versions.
It was surprising to me, though, that Rust was 30% faster than Go in this
embarrassingly-parallel and straightforward workload. Further, the Rust binary
used ~18% less CPU than the Go binary and 3x less CPU than the Python multi-
process version.
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 2/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
In all of these tests, the HEAD object latencies are consistently between 0.50–1.0 ms,
indicating that the backend FlashBlade is not overloaded. Synthetic benchmarking
shows that the FlashBlade can service at least 10x the highest request rate given
multiple clients.
Code Walkthroughs
One of the key challenges with this concurrency example is the need to scale to
billions of objects in a large bucket. This means spawning a new thread or process
per object is inefficient.
In some ways this is an “easy” concurrency problem because each concurrent task
issues a HEAD object request and then prints to stdout any custom metadata. No
data needs to be returned from each task, therefore there is no need to track
pending results and aggregate returned data. We only need to ensure that all
pending requests complete before finishing the program.
There are almost certainly ways to improve performance on each of these examples,
but I have tried to keep programming effort constant across these programs. I would
consider myself an average Python and Go programmer and an inexperienced Rust
programmer. But I have spent several years writing performant and highly
asynchronous C++ code, so I am always looking for languages that ease the pain of
writing concurrent code.
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 3/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Python Baseline
First, I introduce the basic Python elements that make up the single-threaded
implementation in Python. This program retrieves object listings in groups of 1000
(default value) and then sequentially issues HEAD requests to each object before
moving on to the next LIST request. There is only ever one outstanding request,
LIST or HEAD, at a time.
import boto3
FB_DATAVIP='10.62.64.200'
s3 = boto3.resource('s3', endpoint_url='http://' + FB_DATAVIP)
Next, the following code lists all objects in a bucket using the ContinuationToken to
retrieve pages of 1000 keys per request:
while True:
objlist = s3.meta.client.list_objects_v2(**kwargs)
keys = [o['Key'] for o in objlist.get('Contents', [])]
for key in keys:
check_for_custom_metadata(s3, bucketname, key)
try:
kwargs['ContinuationToken'] = objlist['NextContinuationToken']
except KeyError:
break
Note that there are helpers which simplify the listing code above, but I leverage this
per-page logic later for concurrency later. Finally, a head_object() request retrieves
any custom metadata:
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 4/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Python Multiprocessing
To issue the HEAD requests concurrently, I utilize the python multiprocessing
library. There are two ways to run Python multiprocessing: threads or separate
processes. A ThreadPool creates multiple threads of execution within the same
process but those threads are still subject to GIL requirements of only a single
thread executing code. This means concurrent network requests can be issued but
processing is limited to a single thread. In contrast, process-level concurrency
creates independent processes, each with its own GIL, enabling true concurrency at
the cost of making information sharing between processes more challenging.
import multiprocessing
p = multiprocessing.pool.ThreadPool(multiprocessing.cpu_count())
Due to the GIL, threading in Python gives a relatively small speedup (1.5x) and so we
must use process-level concurrency. The core challenge moving to process-level
concurrency is that the s3 client has internal state and cannot be shared across
multiple processes. This means we need a way to ensure that each process has its
own s3 client but since this is an expensive operation we want this to only happen
once per process. When creating a new process, we can pass in a custom
initialization function like this:
def initialize():
global s3_client
s3_client = boto3.resource('s3', \
aws_access_key_id=AWS_KEY, \
aws_secret_access_key=AWS_SECRET, \
use_ssl=False, endpoint_url='http://' + FB_DATAVIP)
And now the “check_for_custom_metadata()” function uses the global variable for
the s3 client instead of a passed argument.
The process pool version then looks almost identical to the threading version.
import multiprocessing
p = multiprocessing.Pool(multiprocessing.cpu_count(), initialize)
… # Listing logic repeated here
keys = [o['Key'] for o in objlist.get('Contents', [])]
# Start async HEAD ops and then continue listing.
for k in keys:
p.apply_async(check_for_custom_metadata, (bucketname, k))
Finally, note that both the multiprocessing versions need to wait after the list
operation completes so that all outstanding HEAD requests complete before exiting.
p.close()
p.join()
A first attempt might be to spawn a new goroutine for each head request, but this
does not scale to large buckets with millions of objects due to the way that the S3
SDK creates tcp connections for each goroutine. In my testing, this resulted in slow
performance and running out of available tcp ports. There are many other
concurrency scenarios where creating a new goroutine per task is sufficient, like an
HTTP server handling requests.
Connecting to FlashBlade S3
Import the following in order to use the AWS SDK:
import (
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 7/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
endpointUrl := "http://10.62.64.200"
s3Config := &aws.Config{
Region: aws.String("us-east-1"), // ignored
DisableSSL: aws.Bool(true),
S3ForcePathStyle: aws.Bool(true),
HTTPClient: httpClient,
}
if endpointUrl != "" {
s3Config.Endpoint = &endpointUrl
}
sess := session.Must(session.NewSession(s3Config))
svc := s3.New(sess)
The region can be set to any valid value as it will not be used for FlashBlade
connections. Disabling SSL is optional and not recommended for production
environments with proper certificates installed on the FlashBlade. Note the
HTTPClient setting, which I will discuss later in the section on TCP connection
reuse.
Concurrency Logic
The following function lists the keys in a bucket using ListObjectPages, which is a
paginated helper that hides the continuation logic. This function takes a function
object to be called with the result of each page of 1000 returned keys. In this case,
each key is added to the shared channel.
err := svc.ListObjectsPages(&s3.ListObjectsInput{
Bucket: bucketname,
Prefix: &pfix,
}, func(p *s3.ListObjectsOutput, _ bool) (shouldContinue bool) {
for _, v := range p.Contents {
channel <- *v.Key
}
return true
})
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 8/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
reportAWSError(err)
close(channel)
}
Once the listing is completed, the channel is closed, indicating no more values will
be sent.
The main routine creates the channel and then starts the listing operation in a
separate thread using the function just described. This channel needs an explicit
size of more than 1000 so that the next list request can start before all head_object
requests have completed. If the channel fills up, the sender will block temporarily
until more head requests have completed.
A fixed number of worker goroutines will read keys from the shared channel and
issue HeadObject requests.
var wg sync.WaitGroup
workerFunc := func() {
defer wg.Done()
for k := range channel {
input := &s3.HeadObjectInput{
Bucket: &bucketname,
Key: &k,
}
The main program needs to wait to exit until all the work is done. This is done in
two phases: first, once the listing is completed, the channel is “closed.” Each worker
goroutine will continue reading keys from the channel and then exit once the
channel is exhausted. A waitgroup tracks all outstanding workers and holds up
completion of the main thread until all the work is done.
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 9/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
The last section of code starts the worker goroutines and then waits for each to
finish:
wg.Add(workerCount) Search
for i := 0; i < workerCount; i++ {
go workerFunc()
}
wg.Wait()
The workerCount variable was originally set to the number of cores (16 in my case)
but I found the 2x increase in goroutines to yield 6% higher requests/sec.
In AWS, there is a limit of 5500 HEAD requests/sec, so the TCP connection reuse
issue is much less likely to arise. In contrast, FlashBlade imposes no per-client
limitations and has orders of magnitude higher metadata performance in even the
smallest configurations. Similarly, my Python programs have not experienced this
issue due to their far lower requests/sec rate.
To alleviate TCP port number exhaustion, the above code creates a custom HTTP
client for the AWS SDK via these instructions. Set the MaxHostIdleConns to a much
higher number (e.g., 100) than the default value of 2 so that connections will be
reused instead of quickly closed.
Rust: Tokio
Rust provides concurrency primitives async and await. With these, concurrent code
can be written similarly to single-threaded code yet still executed with high
concurrently. First, the compiler transforms async code blocks into future
structures, and second, a “runtime” is required to run a future object and make
progress.
Asynchronous Rust code does not run without a runtime, and Tokio is the most
widely-used async runtime in the Rust ecosystem.
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 10/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Rust futures are very different from Goroutines or threads. Instead of each future
mapping to a thread, Tokio runs a set of threads that poll and advance futures until
they complete. Calling an async function does not execute code, but rather creates a
future which needs to be given to the Tokio runtime for execution.
The following code example opens and reads a file. The two “.await?” calls indicate
to yield if the results are not ready yet, letting the runtime schedule another future
instead.
Connecting to FlashBlade S3
I use the AWS SDK for Rust, which is currently in developer preview and not yet
ready for production use. There are other widely-used community implementations,
rust-s3 and rusoto, but I use the AWS SDK in anticipation of it growing in
importance.
[dependencies]
aws-config = “0.6.0”
aws-sdk-s3 = “0.6.0”
aws-endpoint = “0.6.0”
tokio = { version = “1”, features = [“full”] }
http = “0.2”
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 11/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
futures = “0.3”
awaitgroup = “0.6”
Concurrency Logic
The Rust listing loop is similar to the Python structure in the use of a continuation
token, and it creates a new task for the head_request() operation for each key.
loop {
let resp =
client.list_objects_v2().bucket(bucket).prefix(prefix).continuation_
token(continuation_token).send().await?;
for object in resp.contents().unwrap_or_default() {
let key = object.key().unwrap_or_default();
… // start new task here
}
if resp.is_truncated() {
continuation_token =
resp.next_continuation_token().unwrap().to_string();
} else {
break;
}
}
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 12/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
The middle of this loop dispatches a new task for each key returned, where the task
issues a head_object() request using spawn().
…
let req = client.head_object().bucket(bucket).key(key).send();
let worker = wg.worker();
tokio::spawn(async move {
let resp = match req.await {
Ok(r) => r,
Err(e) => panic!("HeadObject Error: {:?}", e),
};
let info = resp.metadata().unwrap();
for (key, value) in info.iter() {
println!("{}: {}", key, value);
}
worker.done();
});
…
Note that spawn returns a handle that can be used to retrieve results, but we ignore
that since the output goes to stdout.
The WaitGroup “worker()” call tracks the number of outstanding tasks. This
mechanism keeps an effective reference count of all outstanding tasks so that the
main thread can wait before exiting.
wg.wait().await;
Ok(())
}
Unlike Python or Go, Rust by default compiles a debug release and so it’s important
to compile with the “--release” flag for performance, otherwise requests/sec is
significantly lower (roughly 3x faster with release mode).
Summary
Writing applications that leverage concurrency effectively is a challenge. As an
example, to issue HEAD requests for every object in a large bucket requires an
application to juggle many concurrent in-flight operations. This is especially true
with a backend like FlashBlade capable of handling 100s of thousands to millions of
concurrent requests per second.
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 13/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Follow
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 14/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Joshua Robinson
102 3
Joshua Robinson
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 15/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
46 1
Joshua Robinson
93 2
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 16/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Joshua Robinson
44
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 17/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Nidhey Indurkar
How did PayPal handle a billion daily transactions with eight virtual
machines?
I recently came across a reddit post that caught my attention: ‘How PayPal Scaled to Billions of
Transactions Daily Using Just 8VMs’…
598 9
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 18/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Lists
Dirk Michel
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 19/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Konstantinos Patronas
30
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 20/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
David Goudet
7.6K 77
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 21/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium
Amazon Simple Storage Service (S3) is a cornerstone of cloud storage solutions, offering
scalability, durability, and unmatched…
28 1
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 22/22