0% found this document useful (0 votes)

49 views22 pages

Concurrent Programming Case Study_ S3 Metadata Requests _ by Joshua Robinson _ Medium

Uploaded by

bhaskar.jain20021814

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views22 pages

Concurrent Programming Case Study_ S3 Metadata Requests _ by Joshua Robinson _ Medium

Uploaded by

bhaskar.jain20021814

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Concurrent Programming Case Study: S3

Metadata Requests
Joshua Robinson · Follow
12 min read · Feb 10, 2022

Listen Share

Comparing Python, Go, and Rust

Recently a FlashBlade customer had a challenge with listing all custom metadata on
objects in a large bucket. A standard S3 LIST request does not return custom
metadata, therefore the task requires also issuing a HEAD request for each object.
With millions of objects in a bucket, the naive Python approach was serialized and
therefore painfully slow. The FlashBlade backend is an all-flash object store
designed for high throughput and metadata performance, so the question naturally
became how to better utilize the storage backend to complete the requests faster?

By rewriting the code in a compiled language with more concurrency, the

performance improved at least 50x compared to the naive Python implementation
and 4x versus the parallelized Python version.

This blog covers:

The performance gain (requests/sec) of Go and Rust relative to Python.

How to use FlashBlade as an S3 endpoint when coding in Python, Go, or Rust.

Basic concurrency patterns in these languages.

Writing performant and correct asynchronous code is hard. Inevitably, many threads
need to concurrently access and modify shared state, leading to lots of nasty bugs.

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 1/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

While Python is the easiest language to work in general, it’s subtle and difficult to
write performant, parallelized Python code. In contrast, Go and Rust make writing
concurrent code relatively easier. In fact, writing concurrent Go code is no harder
than writing single-threaded Go code, which is unsurprising given the language’s
design goals. Writing concurrent Rust is more challenging but correctness is mostly
baked-in, unlike the equivalent C++ code (unless coroutines eventually save the
day!). Go and Rust both have significantly better writability/performance tradeoffs
than Python.

Results
Go and Rust are faster because they are modern compiled languages, whereas
Python is an interpreted language with limited concurrency due to the use of a
global interpreter lock (GIL). The best parallelized Python result was still 3x and 4x
slower than the Go and Rust versions.

And unsurprisingly, the naive single-threaded Python implementation is two orders

of magnitude slower than the Go and Rust versions (50x and 65x difference
respectively).

It was surprising to me, though, that Rust was 30% faster than Go in this
embarrassingly-parallel and straightforward workload. Further, the Rust binary
used ~18% less CPU than the Go binary and 3x less CPU than the Python multi-
process version.

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 2/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

In all of these tests, the HEAD object latencies are consistently between 0.50–1.0 ms,
indicating that the backend FlashBlade is not overloaded. Synthetic benchmarking
shows that the FlashBlade can service at least 10x the highest request rate given
multiple clients.

Code Walkthroughs
One of the key challenges with this concurrency example is the need to scale to
billions of objects in a large bucket. This means spawning a new thread or process
per object is inefficient.

In some ways this is an “easy” concurrency problem because each concurrent task
issues a HEAD object request and then prints to stdout any custom metadata. No
data needs to be returned from each task, therefore there is no need to track
pending results and aggregate returned data. We only need to ensure that all
pending requests complete before finishing the program.

There are almost certainly ways to improve performance on each of these examples,
but I have tried to keep programming effort constant across these programs. I would
consider myself an average Python and Go programmer and an inexperienced Rust
programmer. But I have spent several years writing performant and highly
asynchronous C++ code, so I am always looking for languages that ease the pain of
writing concurrent code.

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 3/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Python Baseline
First, I introduce the basic Python elements that make up the single-threaded
implementation in Python. This program retrieves object listings in groups of 1000
(default value) and then sequentially issues HEAD requests to each object before
moving on to the next LIST request. There is only ever one outstanding request,
LIST or HEAD, at a time.

First, install boto3:

pip install boto3

Configure the s3 client to access the FlashBlade’s data VIP (10.62.64.200 in my

example):

import boto3
FB_DATAVIP='10.62.64.200'
s3 = boto3.resource('s3', endpoint_url='http://' + FB_DATAVIP)

Next, the following code lists all objects in a bucket using the ContinuationToken to
retrieve pages of 1000 keys per request:

while True:
objlist = s3.meta.client.list_objects_v2(**kwargs)
keys = [o['Key'] for o in objlist.get('Contents', [])]
for key in keys:
check_for_custom_metadata(s3, bucketname, key)

try:
kwargs['ContinuationToken'] = objlist['NextContinuationToken']
except KeyError:
break

Note that there are helpers which simplify the listing code above, but I leverage this
per-page logic later for concurrency later. Finally, a head_object() request retrieves
any custom metadata:

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 4/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

def check_for_custom_metadata(s3, bucketname, key):

response = s3.meta.client.head_object(Bucket=bucketname, Key=key)
if response['Metadata']:
print("{} {}".format(key, response['Metadata']))

Python Multiprocessing
To issue the HEAD requests concurrently, I utilize the python multiprocessing
library. There are two ways to run Python multiprocessing: threads or separate
processes. A ThreadPool creates multiple threads of execution within the same
process but those threads are still subject to GIL requirements of only a single
thread executing code. This means concurrent network requests can be issued but
processing is limited to a single thread. In contrast, process-level concurrency
creates independent processes, each with its own GIL, enabling true concurrency at
the cost of making information sharing between processes more challenging.

To utilize multiple threads, I create a ThreadPool and use apply_async() to

asynchronously execute each head request. The below code augments the previous
listing code:

import multiprocessing
p = multiprocessing.pool.ThreadPool(multiprocessing.cpu_count())

… # same listing logic as above

keys = [o['Key'] for o in objlist.get('Contents', [])]

# start HEAD operations asynchronously

for k in keys:
p.apply_async(check_for_custom_metadata, (s3, bucketname, k))

Due to the GIL, threading in Python gives a relatively small speedup (1.5x) and so we
must use process-level concurrency. The core challenge moving to process-level
concurrency is that the s3 client has internal state and cannot be shared across
multiple processes. This means we need a way to ensure that each process has its
own s3 client but since this is an expensive operation we want this to only happen
once per process. When creating a new process, we can pass in a custom
initialization function like this:

# make a per process s3_client

s3_client = None
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 5/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

def initialize():
global s3_client
s3_client = boto3.resource('s3', \
aws_access_key_id=AWS_KEY, \
aws_secret_access_key=AWS_SECRET, \
use_ssl=False, endpoint_url='http://' + FB_DATAVIP)

And now the “check_for_custom_metadata()” function uses the global variable for
the s3 client instead of a passed argument.

The process pool version then looks almost identical to the threading version.

import multiprocessing

p = multiprocessing.Pool(multiprocessing.cpu_count(), initialize)
… # Listing logic repeated here
keys = [o['Key'] for o in objlist.get('Contents', [])]
# Start async HEAD ops and then continue listing.
for k in keys:
p.apply_async(check_for_custom_metadata, (bucketname, k))

Finally, note that both the multiprocessing versions need to wait after the list
operation completes so that all outstanding HEAD requests complete before exiting.

p.close()
p.join()

The challenge of Python multiprocessing is sharing state across processes as even

this one-way example demonstrates. And unfortunately even with this effort, the
performance is far below what is possible.

The complete Python multiprocessing code can be found here.

Go: Goroutines and Channels

The two Go primitives that enable “easy mode” concurrent programming are
goroutines and channels.

“A goroutine is a lightweight thread of execution.”

“Channels are the pipes that connect concurrent goroutines.”

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 6/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

A first attempt might be to spawn a new goroutine for each head request, but this
does not scale to large buckets with millions of objects due to the way that the S3
SDK creates tcp connections for each goroutine. In my testing, this resulted in slow
performance and running out of available tcp ports. There are many other
concurrency scenarios where creating a new goroutine per task is sufficient, like an
HTTP server handling requests.

The option I implemented instead was to create a small number of “worker”

goroutines, each of which retrieves object keys from a channel and issues HEAD
requests. A separate goroutine lists objects and adds them to the shared channel.
The number of workers limits the amount of concurrency in the system, which is
necessary to scale to billions of objects. The channel is a work queue in this
scenario, as shown in the diagram below.

The full code for the Go version can be found here.

Connecting to FlashBlade S3
Import the following in order to use the AWS SDK:

import (
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 7/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

First, configure the s3 client to connect to a FlashBlade based on endpoint which

points to the FlashBlade data VIP.

endpointUrl := "http://10.62.64.200"
s3Config := &aws.Config{
Region: aws.String("us-east-1"), // ignored
DisableSSL: aws.Bool(true),
S3ForcePathStyle: aws.Bool(true),
HTTPClient: httpClient,
}
if endpointUrl != "" {
s3Config.Endpoint = &endpointUrl
}

sess := session.Must(session.NewSession(s3Config))
svc := s3.New(sess)

The region can be set to any valid value as it will not be used for FlashBlade
connections. Disabling SSL is optional and not recommended for production
environments with proper certificates installed on the FlashBlade. Note the
HTTPClient setting, which I will discuss later in the section on TCP connection
reuse.

Concurrency Logic
The following function lists the keys in a bucket using ListObjectPages, which is a
paginated helper that hides the continuation logic. This function takes a function
object to be called with the result of each page of 1000 returned keys. In this case,
each key is added to the shared channel.

func listToChannelAndClose(svc s3.S3, bucketname string, pfix

string, channel chan string) {

err := svc.ListObjectsPages(&s3.ListObjectsInput{
Bucket: bucketname,
Prefix: &pfix,
}, func(p *s3.ListObjectsOutput, _ bool) (shouldContinue bool) {
for _, v := range p.Contents {
channel <- *v.Key
}
return true
})

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 8/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

reportAWSError(err)
close(channel)
}

Once the listing is completed, the channel is closed, indicating no more values will
be sent.

The main routine creates the channel and then starts the listing operation in a
separate thread using the function just described. This channel needs an explicit
size of more than 1000 so that the next list request can start before all head_object
requests have completed. If the channel fills up, the sender will block temporarily
until more head requests have completed.

channel := make(chan string, 2048)

go listToChannelAndClose(svc, &bucketname, prefix, channel)

A fixed number of worker goroutines will read keys from the shared channel and
issue HeadObject requests.

var wg sync.WaitGroup
workerFunc := func() {
defer wg.Done()
for k := range channel {
input := &s3.HeadObjectInput{
Bucket: &bucketname,
Key: &k,
}

res, err := svc.HeadObject(input)

reportAWSError(err)
if len(res.Metadata) > 0 {
printMetadata(k, res.Metadata)
}
}
}

The main program needs to wait to exit until all the work is done. This is done in
two phases: first, once the listing is completed, the channel is “closed.” Each worker
goroutine will continue reading keys from the channel and then exit once the
channel is exhausted. A waitgroup tracks all outstanding workers and holds up
completion of the main thread until all the work is done.
https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 9/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

The last section of code starts the worker goroutines and then waits for each to
finish:

Open in app Sign up Sign in

workerCount := 2 * runtime.NumCPU()

wg.Add(workerCount) Search
for i := 0; i < workerCount; i++ {
go workerFunc()
}
wg.Wait()

The workerCount variable was originally set to the number of cores (16 in my case)
but I found the 2x increase in goroutines to yield 6% higher requests/sec.

Sidebar: TCP Connection Reuse

This high request rate workload triggers port number exhaustion due to the default
way tcp connections are poorly reused. Briefly, the S3 SDK by default keeps the open
connection pool far too small, only allowing two idle connections.

In AWS, there is a limit of 5500 HEAD requests/sec, so the TCP connection reuse
issue is much less likely to arise. In contrast, FlashBlade imposes no per-client
limitations and has orders of magnitude higher metadata performance in even the
smallest configurations. Similarly, my Python programs have not experienced this
issue due to their far lower requests/sec rate.

To alleviate TCP port number exhaustion, the above code creates a custom HTTP
client for the AWS SDK via these instructions. Set the MaxHostIdleConns to a much
higher number (e.g., 100) than the default value of 2 so that connections will be
reused instead of quickly closed.

Rust: Tokio
Rust provides concurrency primitives async and await. With these, concurrent code
can be written similarly to single-threaded code yet still executed with high
concurrently. First, the compiler transforms async code blocks into future
structures, and second, a “runtime” is required to run a future object and make
progress.

Asynchronous Rust code does not run without a runtime, and Tokio is the most
widely-used async runtime in the Rust ecosystem.

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 10/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Rust futures are very different from Goroutines or threads. Instead of each future
mapping to a thread, Tokio runs a set of threads that poll and advance futures until
they complete. Calling an async function does not execute code, but rather creates a
future which needs to be given to the Tokio runtime for execution.

The following code example opens and reads a file. The two “.await?” calls indicate
to yield if the results are not ready yet, letting the runtime schedule another future
instead.

let mut f = File::open("foo.txt").await?;

let mut buffer = [0; 10];
let n = f.read(&mut buffer[..]).await?;

So instead of arranging callbacks to asynchronously respond when the file open

returns or when the read completes, the logic is kept together in a block that has
almost the same readability as a blocking, single-threaded version. Better
readability of asynchronous code means fewer bugs.

Spawning Tokio tasks is a lightweight operation relative to spawning Goroutines. As

a result, we can create a task for each key whereas we could not efficiently create a
goroutine per key.

Asynchronous Rust programming is a complex and involved topic, this is only a

basic introduction. The full Rust example code be can be found here.

Connecting to FlashBlade S3
I use the AWS SDK for Rust, which is currently in developer preview and not yet
ready for production use. There are other widely-used community implementations,
rust-s3 and rusoto, but I use the AWS SDK in anticipation of it growing in
importance.

First, my Cargo.toml contents:

[dependencies]
aws-config = “0.6.0”
aws-sdk-s3 = “0.6.0”
aws-endpoint = “0.6.0”
tokio = { version = “1”, features = [“full”] }
http = “0.2”

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 11/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

futures = “0.3”
awaitgroup = “0.6”

Next, configure an S3 client to connect to the FlashBlade S3 endpoint. As with other

examples, the endpoint needs to point to the FlashBlade data VIP and the region
needs to be valid but is otherwise ignored.

let endpoint = "http://10.62.64.200";

let region = Region::new("us-west-2"); // Value ignored

// load S3 access and secret keys from environment variables

let conf = aws_config::load_from_env().await;
let ep = Endpoint::immutable(Uri::from_static(endpoint));
let s3_conf = aws_sdk_s3::config::Builder::from(&conf)
.endpoint_resolver(ep)
.region(region)
.build();
let client = Client::from_conf(s3_conf);

Concurrency Logic
The Rust listing loop is similar to the Python structure in the use of a continuation
token, and it creates a new task for the head_request() operation for each key.

let mut wg = WaitGroup::new();

let mut continuation_token = String::from("");

loop {
let resp =
client.list_objects_v2().bucket(bucket).prefix(prefix).continuation_
token(continuation_token).send().await?;
for object in resp.contents().unwrap_or_default() {
let key = object.key().unwrap_or_default();
… // start new task here
}
if resp.is_truncated() {
continuation_token =
resp.next_continuation_token().unwrap().to_string();
} else {
break;
}
}

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 12/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

The middle of this loop dispatches a new task for each key returned, where the task
issues a head_object() request using spawn().

…
let req = client.head_object().bucket(bucket).key(key).send();
let worker = wg.worker();
tokio::spawn(async move {
let resp = match req.await {
Ok(r) => r,
Err(e) => panic!("HeadObject Error: {:?}", e),
};
let info = resp.metadata().unwrap();
for (key, value) in info.iter() {
println!("{}: {}", key, value);
}
worker.done();
});
…

Note that spawn returns a handle that can be used to retrieve results, but we ignore
that since the output goes to stdout.

The WaitGroup “worker()” call tracks the number of outstanding tasks. This
mechanism keeps an effective reference count of all outstanding tasks so that the
main thread can wait before exiting.

wg.wait().await;
Ok(())
}

Unlike Python or Go, Rust by default compiles a debug release and so it’s important
to compile with the “--release” flag for performance, otherwise requests/sec is
significantly lower (roughly 3x faster with release mode).

Summary
Writing applications that leverage concurrency effectively is a challenge. As an
example, to issue HEAD requests for every object in a large bucket requires an
application to juggle many concurrent in-flight operations. This is especially true
with a backend like FlashBlade capable of handling 100s of thousands to millions of
concurrent requests per second.

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 13/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

I compared concurrent implementations of this example program written using

Python multiprocessing, Go, and Rust. Both Go and Rust produce significantly
better requests/sec performance than concurrent Python (3x and 4x faster
respectively). Surprisingly, Rust is 30% faster than Go, despite this being the first
concurrent Rust program that I have written. And unsurprisingly, Go clearly wins in
the writability and readability of concurrent code.

I then walked through two interesting elements of each program: 1) how to

configure an S3 client to connect to a FlashBlade via endpoint override and 2) the
pattern used to generate the concurrency in each language. My hope is that these
code examples are educational to others hoping to write high performance S3
applications.

Flashblade S3 Rust Programming Language Golang Python Programming

Written by Joshua Robinson

353 Followers

Data science, software engineering, hacking

More from Joshua Robinson

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 14/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Joshua Robinson

S5cmd for High Performance Object Storage

Or Why Friends Don’t Let Friends Use s3cmd

8 min read · Aug 6, 2019

102 3

Joshua Robinson

Improving Python S3 Client Performance with Rust

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 15/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Replacing Boto3 for Fun and Profit

7 min read · Mar 31, 2022

46 1

Joshua Robinson

Faster Data Loading for Pandas on S3

20x Improvement Loading CSV from FlashBlade S3

11 min read · Jan 26, 2022

93 2

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 16/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Joshua Robinson

Listing 67 Billion Objects in 1 Bucket

In this post, I look at what it takes to list all keys in a single bucket with 67 billion objects and
build a simple list benchmark program…

8 min read · Dec 8, 2020

See all from Joshua Robinson

Recommended from Medium

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 17/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Nidhey Indurkar

How did PayPal handle a billion daily transactions with eight virtual
machines?
I recently came across a reddit post that caught my attention: ‘How PayPal Scaled to Billions of
Transactions Daily Using Just 8VMs’…

7 min read · Jan 1

598 9

Skunk_Ink in The Sia Blog

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 18/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Sia S3 Integration: rclone

A guide to mounting Sia as a folder using rclone

4 min read · Nov 9, 2023

Lists

Coding & Development

11 stories · 373 saves

General Coding Knowledge

20 stories · 783 saves

Stories to Help You Grow as a Software Developer

19 stories · 705 saves

Generative AI Recommended Reading

52 stories · 604 saves

Dirk Michel

On Amazon EKS and Managed File Systems

How to choose between managed file system options for Amazon EKS.

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 19/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

9 min read · Aug 24, 2023

Konstantinos Patronas

Create your own S3 server using Minio

Minio is an object storage server that implements the same public API as Amazon S3. This
means that applications that can be configured to…

· 4 min read · Sep 23, 2023

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 20/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

David Goudet

This is Why I Didn’t Accept You as a Senior Software Engineer

An Alarming Trend in The Software Industry

· 5 min read · Jul 26, 2023

7.6K 77

Puneet Punj in Towards AWS

Unveiling Hidden Savings: Efficient S3 Multipart Uploads

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 21/22
12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Amazon Simple Storage Service (S3) is a cornerstone of cloud storage solutions, offering
scalability, durability, and unmatched…

7 min read · Aug 18, 2023

28 1

See more recommendations

https://joshua-robinson.medium.com/concurrent-programming-case-study-s3-metadata-requests-a0f1dcd0ba7d 22/22

Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
Learn NodeJS in 1 Day: Complete Node JS Guide with Examples
From Everand
Learn NodeJS in 1 Day: Complete Node JS Guide with Examples
Krishna Rungta
3.5/5 (4)
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Practical C++ Backend Programming
From Everand
Practical C++ Backend Programming
Justin Barbara
No ratings yet
Practical Go: Building Scalable Network and Non-Network Applications
From Everand
Practical Go: Building Scalable Network and Non-Network Applications
Amit Saha
No ratings yet
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Python for Mechanical and Aerospace Engineering
From Everand
Python for Mechanical and Aerospace Engineering
Alexander Kenan
No ratings yet
Building Serverless Apps with Azure Functions and Cosmos DB: Leverage Azure functions and Cosmos DB for building serverless applications (English Edition)
From Everand
Building Serverless Apps with Azure Functions and Cosmos DB: Leverage Azure functions and Cosmos DB for building serverless applications (English Edition)
Hansamali Gamage
No ratings yet
PHP Microservices
From Everand
PHP Microservices
Carlos Pérez Sánchez
3/5 (1)
Building Web Apps with Python and Flask: Learn to Develop and Deploy Responsive RESTful Web Applications Using Flask Framework (English Edition)
From Everand
Building Web Apps with Python and Flask: Learn to Develop and Deploy Responsive RESTful Web Applications Using Flask Framework (English Edition)
Malhar Lathkar
4/5 (1)
253 Startup Failure Post-Mortems
100% (1)
253 Startup Failure Post-Mortems
97 pages
Isat Grade 6 Sample
100% (1)
Isat Grade 6 Sample
82 pages
Abnormal ECG
67% (3)
Abnormal ECG
55 pages
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Con Currency and Distributed System in Python
100% (2)
Con Currency and Distributed System in Python
51 pages
Protocol Buffers Handbook: Getting deeper into Protobuf internals and its usage
From Everand
Protocol Buffers Handbook: Getting deeper into Protobuf internals and its usage
Clément Jean
No ratings yet
Learn MongoDB in 24 Hours
From Everand
Learn MongoDB in 24 Hours
Alex Nordeen
5/5 (2)
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
Amazon SimpleDB: LITE
From Everand
Amazon SimpleDB: LITE
Prabhakar Chaganti
No ratings yet
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
From Everand
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
Justin Barbara
No ratings yet
Learn Python in One Hour: Programming by Example
From Everand
Learn Python in One Hour: Programming by Example
Victor R. Volkman
3/5 (2)
qooxdoo Beginner's Guide
From Everand
qooxdoo Beginner's Guide
Mohamed Raffi
No ratings yet
Python Data Persistence
From Everand
Python Data Persistence
Malhar Lathkar
No ratings yet
Hands-On Python for DevOps: Leverage Python's native libraries to streamline your workflow and save time with automation
From Everand
Hands-On Python for DevOps: Leverage Python's native libraries to streamline your workflow and save time with automation
Ankur Roy
No ratings yet
PyParallel: How We Removed The GIL and Exploited All Cores
No ratings yet
PyParallel: How We Removed The GIL and Exploited All Cores
153 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
API Gateway, Cognito and Node.js Lambdas
From Everand
API Gateway, Cognito and Node.js Lambdas
Matthew Casperson
5/5 (1)
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Practical Python Backend Programming
From Everand
Practical Python Backend Programming
Tim Peters
No ratings yet
Practical Python Backend Programming: Build Flask and FastAPI applications, asynchronous programming, containerization and deploy apps on cloud
From Everand
Practical Python Backend Programming: Build Flask and FastAPI applications, asynchronous programming, containerization and deploy apps on cloud
Tim Peters
No ratings yet
How to Hack Like a Ghost: Breaching the Cloud
From Everand
How to Hack Like a Ghost: Breaching the Cloud
Sparc Flow
No ratings yet
Performance Analysis of Network Port Scanning When Using Sequential Processing, Multithreading and Multiprocessing in Python Programming Language
No ratings yet
Performance Analysis of Network Port Scanning When Using Sequential Processing, Multithreading and Multiprocessing in Python Programming Language
4 pages
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
Programming Concepts in C++
From Everand
Programming Concepts in C++
Robert Burns
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Computer Practices Using C++
From Everand
Computer Practices Using C++
Ramkrishna Ghosh
No ratings yet
Go Programming Blueprints - Second Edition
From Everand
Go Programming Blueprints - Second Edition
Mat Ryer
4.5/5 (3)
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
From Everand
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
David Hecksel
5/5 (2)
Relayd and Httpd Mastery: IT Mastery, #11
From Everand
Relayd and Httpd Mastery: IT Mastery, #11
Michael W. Lucas
No ratings yet
PHP Examples, Part 2
From Everand
PHP Examples, Part 2
Adam Majczak
1.5/5 (3)
Mastering Performance Optimization in Python: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Performance Optimization in Python: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
Async Programming in Python
No ratings yet
Async Programming in Python
86 pages
Mastering JavaScript Single Page Application Development
From Everand
Mastering JavaScript Single Page Application Development
Philip Klauzinski
No ratings yet
Building Modern Web Applications with ASP.NET Core Blazor: Learn how to use Blazor to create powerful, responsive, and engaging web applications (English Edition)
From Everand
Building Modern Web Applications with ASP.NET Core Blazor: Learn how to use Blazor to create powerful, responsive, and engaging web applications (English Edition)
Brian Ding
No ratings yet
CodeIgniter 1.7
From Everand
CodeIgniter 1.7
David Upton
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Learn Rust Programming: Safe Code, Supports Low Level and Embedded Systems Programming with a Strong Ecosystem (English Edition)
From Everand
Learn Rust Programming: Safe Code, Supports Low Level and Embedded Systems Programming with a Strong Ecosystem (English Edition)
Claus Matzinger
No ratings yet
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
From Everand
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
Exam OG
No ratings yet
Mastering Object-Oriented Programming with Python: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Object-Oriented Programming with Python: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
Ruby Concurrency Explained
No ratings yet
Ruby Concurrency Explained
7 pages
PHP, MySQL, & JavaScript All-In-One For Dummies
From Everand
PHP, MySQL, & JavaScript All-In-One For Dummies
Richard Blum
1/5 (1)
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Backend Development
From Everand
Backend Development
Kai Turing
No ratings yet
Distributed Computing with Python
From Everand
Distributed Computing with Python
Francesco Pierfederici
No ratings yet
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
From Everand
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
Tim Peters
No ratings yet
Azure Bicep QuickStart Pro: From JSON and ARM Templates to Advanced Deployment Techniques, CI/CD Integration, and Environment Management
From Everand
Azure Bicep QuickStart Pro: From JSON and ARM Templates to Advanced Deployment Techniques, CI/CD Integration, and Environment Management
Selina Threxan
No ratings yet
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
From Everand
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
Fergal Dearle
No ratings yet
Azure Bicep QuickStart Pro
From Everand
Azure Bicep QuickStart Pro
Selina Threxan
No ratings yet
Node.js, JavaScript, API: Interview Questions and Answers
From Everand
Node.js, JavaScript, API: Interview Questions and Answers
John Edward Cooper Berg
5/5 (1)
Concurrency Analysis Report
No ratings yet
Concurrency Analysis Report
42 pages
JavaScript File Handling from Scratch: A Practical Guide with Examples
From Everand
JavaScript File Handling from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Graham Doddsville - Issue 43 - VF
No ratings yet
Graham Doddsville - Issue 43 - VF
48 pages
Andamantourism.gov.in Etourist Index.php Home Sbipayment Success
No ratings yet
Andamantourism.gov.in Etourist Index.php Home Sbipayment Success
2 pages
alifeengineered.substack.com-High-Quality Inputs
No ratings yet
alifeengineered.substack.com-High-Quality Inputs
2 pages
Meritocracy Harms Everyone - The Atlantic
100% (4)
Meritocracy Harms Everyone - The Atlantic
11 pages
Gold and The Capital Cycle - 2 PDF
No ratings yet
Gold and The Capital Cycle - 2 PDF
33 pages
Valuepickr Goa Conference Presentation Abhishek Basumallick PDF
No ratings yet
Valuepickr Goa Conference Presentation Abhishek Basumallick PDF
14 pages
10 Traits To Look For in Multibaggers: @insharebazaar
No ratings yet
10 Traits To Look For in Multibaggers: @insharebazaar
4 pages
Life Lessons Learnt From My First 10 Day Silent Meditation Retreat (Vipassana)
No ratings yet
Life Lessons Learnt From My First 10 Day Silent Meditation Retreat (Vipassana)
6 pages
Prof Sanjay Bakshi's Fav Stock Falls From Grace Even As ValuePickr Forum's Ominous Warning Rings True
No ratings yet
Prof Sanjay Bakshi's Fav Stock Falls From Grace Even As ValuePickr Forum's Ominous Warning Rings True
7 pages
3 PE Rms in Talks To Buy Stake in Glenmark API Business
No ratings yet
3 PE Rms in Talks To Buy Stake in Glenmark API Business
5 pages
Btrfs - The Next Generation Filesystem On Linux: Neependra Khare
No ratings yet
Btrfs - The Next Generation Filesystem On Linux: Neependra Khare
23 pages
Safeguards For Your Portfolio Service
No ratings yet
Safeguards For Your Portfolio Service
1 page
Alibaba Numbers Could Be Fake, Bronte Capital Hedge Fund Manager Says - Fortune
No ratings yet
Alibaba Numbers Could Be Fake, Bronte Capital Hedge Fund Manager Says - Fortune
18 pages
Cmv28i02 DivPayingStocks
No ratings yet
Cmv28i02 DivPayingStocks
4 pages
Diesel
No ratings yet
Diesel
1 page
AAII-My Investment Letter Words of Advice For My Grandchildren
No ratings yet
AAII-My Investment Letter Words of Advice For My Grandchildren
5 pages
Indian FMCG Industry, September 2012
No ratings yet
Indian FMCG Industry, September 2012
77 pages
2
100% (4)
2
702 pages
Multi-Resource Fair Queueing For Packet Processing:, Vyas Sekar, Matei Zaharia, Ion Stoica
No ratings yet
Multi-Resource Fair Queueing For Packet Processing:, Vyas Sekar, Matei Zaharia, Ion Stoica
40 pages
Gurgaon
0% (1)
Gurgaon
1 page
Taxation Direct Tax Code Assignment 2: SUBMITTED TO: Mrs. Ranjani Matta SUBMITTED BY: Shalini Mahawar
No ratings yet
Taxation Direct Tax Code Assignment 2: SUBMITTED TO: Mrs. Ranjani Matta SUBMITTED BY: Shalini Mahawar
6 pages
Basic Principles of SFL GBA & Stages
No ratings yet
Basic Principles of SFL GBA & Stages
10 pages
Translation Practice Set
No ratings yet
Translation Practice Set
31 pages
Amenities Guidelines
No ratings yet
Amenities Guidelines
2 pages
Web Results: IGNOU - The People's University
No ratings yet
Web Results: IGNOU - The People's University
5 pages
Courses Taught
No ratings yet
Courses Taught
5 pages
LLAMAS ARAH M. (Analytical Chem. Act.1)
No ratings yet
LLAMAS ARAH M. (Analytical Chem. Act.1)
6 pages
Career Assessment Questionnaire
100% (1)
Career Assessment Questionnaire
3 pages
Energy Advances: Review Article
No ratings yet
Energy Advances: Review Article
26 pages
Module Chapter 1 - Parts of Speech
No ratings yet
Module Chapter 1 - Parts of Speech
23 pages
IBS Presentation
No ratings yet
IBS Presentation
31 pages
Changing Pattern of Financial Crime in Bangladesh
No ratings yet
Changing Pattern of Financial Crime in Bangladesh
21 pages
J.S. University 4 Semester
No ratings yet
J.S. University 4 Semester
1 page
Shower Leakage at TATA Motors Plant
0% (1)
Shower Leakage at TATA Motors Plant
35 pages
Advanced Excel Formulas
No ratings yet
Advanced Excel Formulas
314 pages
Acknowledgement
No ratings yet
Acknowledgement
5 pages
Feasibility Study On Lnhs Independence Final
No ratings yet
Feasibility Study On Lnhs Independence Final
70 pages
Resume
No ratings yet
Resume
4 pages
The Golden Apple Snail Pomacea Canaliculata: A Review On Invasion, Dispersion and Control
No ratings yet
The Golden Apple Snail Pomacea Canaliculata: A Review On Invasion, Dispersion and Control
8 pages
Term Paper of Statistics - Wilcoxon Test
No ratings yet
Term Paper of Statistics - Wilcoxon Test
17 pages
Quantum Dot Lasers
No ratings yet
Quantum Dot Lasers
24 pages
General Specification For Office Building - R1
No ratings yet
General Specification For Office Building - R1
3 pages
Flor Essence Testimonial
No ratings yet
Flor Essence Testimonial
8 pages
Pip Cholestin
No ratings yet
Pip Cholestin
2 pages
Foundation Design-Rev-2-By S.N. Manohar
No ratings yet
Foundation Design-Rev-2-By S.N. Manohar
6 pages
Ca 2020 Jan To May
No ratings yet
Ca 2020 Jan To May
64 pages
Labsheet 1 Embedded System Application
No ratings yet
Labsheet 1 Embedded System Application
7 pages

Concurrent Programming Case Study_ S3 Metadata Requests _ by Joshua Robinson _ Medium

Uploaded by

Concurrent Programming Case Study_ S3 Metadata Requests _ by Joshua Robinson _ Medium

Uploaded by

12/01/2024, 19:49 Concurrent Programming Case Study: S3 Metadata Requests | by Joshua Robinson | Medium

Concurrent Programming Case Study: S3

Comparing Python, Go, and Rust

By rewriting the code in a compiled language with more concurrency, the

This blog covers:

The performance gain (requests/sec) of Go and Rust relative to Python.

How to use FlashBlade as an S3 endpoint when coding in Python, Go, or Rust.

Basic concurrency patterns in these languages.

And unsurprisingly, the naive single-threaded Python implementation is two orders

First, install boto3:

pip install boto3

Configure the s3 client to access the FlashBlade’s data VIP (10.62.64.200 in my

def check_for_custom_metadata(s3, bucketname, key):

To utilize multiple threads, I create a ThreadPool and use apply_async() to

… # same listing logic as above

# start HEAD operations asynchronously

# make a per process s3_client

The challenge of Python multiprocessing is sharing state across processes as even

The complete Python multiprocessing code can be found here.

Go: Goroutines and Channels

“A goroutine is a lightweight thread of execution.”

“Channels are the pipes that connect concurrent goroutines.”

The option I implemented instead was to create a small number of “worker”

The full code for the Go version can be found here.

First, configure the s3 client to connect to a FlashBlade based on endpoint which

func listToChannelAndClose(svc *s3.S3, bucketname *string, pfix

channel := make(chan string, 2048)

res, err := svc.HeadObject(input)

Open in app Sign up Sign in

Sidebar: TCP Connection Reuse

let mut f = File::open("foo.txt").await?;

So instead of arranging callbacks to asynchronously respond when the file open

Spawning Tokio tasks is a lightweight operation relative to spawning Goroutines. As

Asynchronous Rust programming is a complex and involved topic, this is only a

First, my Cargo.toml contents:

Next, configure an S3 client to connect to the FlashBlade S3 endpoint. As with other

let endpoint = "http://10.62.64.200";

// load S3 access and secret keys from environment variables

let mut wg = WaitGroup::new();

I compared concurrent implementations of this example program written using

I then walked through two interesting elements of each program: 1) how to

Flashblade S3 Rust Programming Language Golang Python Programming

Written by Joshua Robinson

Data science, software engineering, hacking

More from Joshua Robinson

S5cmd for High Performance Object Storage

8 min read · Aug 6, 2019

Improving Python S3 Client Performance with Rust

Replacing Boto3 for Fun and Profit

7 min read · Mar 31, 2022

Faster Data Loading for Pandas on S3

11 min read · Jan 26, 2022

Listing 67 Billion Objects in 1 Bucket

8 min read · Dec 8, 2020

See all from Joshua Robinson

Recommended from Medium

7 min read · Jan 1

Skunk_Ink in The Sia Blog

Sia S3 Integration: rclone

4 min read · Nov 9, 2023

Coding & Development

General Coding Knowledge

Stories to Help You Grow as a Software Developer

Generative AI Recommended Reading

On Amazon EKS and Managed File Systems

9 min read · Aug 24, 2023

Create your own S3 server using Minio

· 4 min read · Sep 23, 2023

This is Why I Didn’t Accept You as a Senior Software Engineer

· 5 min read · Jul 26, 2023

Puneet Punj in Towards AWS

Unveiling Hidden Savings: Efficient S3 Multipart Uploads

7 min read · Aug 18, 2023

See more recommendations

You might also like

func listToChannelAndClose(svc s3.S3, bucketname string, pfix