© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STG366
Unlock the power of your data
with Amazon S3 Metadata
Hiren Chandiramani Huey Han
He/Him He/Him
Principal Worldwide Storage Specialist Principal Product Manager
Amazon Web Services Amazon Web Services
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
01 Data discovery challenges
02 S3 Metadata overview
03 Demo
04 Summary
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data discovery challenges
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data is growing
faster than ever
S3 holds more than 400 trillion objects
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Geospatial or lunar imagery Internet of Things (IoT) sensor data
Customer call-center records Medical images and records
Analytics Mobile sync and storage Digital record preservation
Mobile sync and storage
Compliance records
Data lakes
Media master files
Home video recordings Pharmaceutical study data
Model checkpoints Seismic and reservoir simulation data
DNA sequences
Amazon S3 Backups Website hosting
Surveillance video/closed-circuit television
Log files Machine learning training data
Media assets
User-generated content Financial records Autonomous vehicle data
Meteorological and
Oil and gas topography
environmental research
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
New value from ML/AI training
UNSTRUCTURED DATA
TRAIN CUSTOMIZE VALUE
Text generation
Summarization
Unstructured & Info extraction
Structured
Gen AI Data Q&A
Chatbot
STRUCTURED DATA Business insights generation
Amazon S3 Gen AI unlocks new business value from
unstructured data
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to find
actionable
datasets at
scale?
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Metadata is key
Accurate metadata Metadata is only as Metadata stores live
makes it easier to find useful as it is current outside of storage, and
data sets are difficult to build,
operate, and scale
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing S3 Metadata
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Includes system metadata, advanced
fields like object tags and user-defined
metadata
Preview Dec. 3 2024
S3 Metadata Tables Stored in an Iceberg table in your
Amazon S3 table bucket
Automatic metadata generation, accessible
with simple SQL semantics
Metadata is generated in near real-time
as you create data in a bucket, becomes
queryable in minutes
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How it works
A multitude of data
sources Your bucket Your table bucket
Configure S3 Queryable
Metadata on metadata table
your bucket
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Journal table
Change events to the bucket
AWS-managed table
Read-only for non-AWS principals
It’s a “system table” for your bucket
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Metadata Data Type
bucket String
key String
sequence_number String
21 system metadata fields
record_type String
record_timestamp Timestamp
version_id
is_delete_marker
String
Boolean
automatically recorded for all
size Long new objects in your bucket
last_modified_date Timestamp
e_tag
storage_class
String
String
Object size
is_multipart Boolean Storage class
encryption_status String Create date
is_bucket_key_enabled Boolean
kms_key_arn String
Client information
checksum_algorithm String Encryption information
object_tags
user_metadata
Map <String, String>
Map <String, String>
…
requester String
source_ip_address String
request_id String
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Queryable using a wide range of tools
Interactive Big data Data ML-powered Open source
query processing warehousing business intelligence tools
Amazon Amazon Amazon Amazon
Athena EMR Redshift QuickSight Apache Spark
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lake Formation – Fine-grained access control
Amazon Athena
AWS Glue
AWS Lake
Formation Amazon Redshift
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SmugMug/Flickr
Imagine flying a time machine through your Amazon
S3 data. At SmugMug and Flickr, we've stored over
22 years of our customers' photos, hundreds of
billions of objects, in S3. The new S3 Metadata
feature helps us to easily explore our S3 object
metadata easily and affordably.
Andrew Shieh
Principal Engineer,
SmugMug
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Custom metadata
Object tags User-defined Joined tables
Mutable Mutable
Immutable
Dedicated APIs Fully customized
Only added on PUT
Dedicated permissions Detached from objects
No cost
Paid Amazon S3 feature Cost of table storage
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Video generated via Amazon Bedrock is
automatically annotated with custom metadata
In Amazon Bedrock’s PUT request:
Amazon x-amz-meta-content-source: AmazonBedrock
Bedrock
x-amz-meta-content-model-id: arn:aws:::model/xyz-v1
Query from Amazon S3 Metadata:
Journal SELECT bucket, key, user_defined_metadata, row_type
table FROM aws_s3_metadata.my_metadata_table
WHERE user_defined_metadata[‘content-source’] = AmazonBedrock
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo – Part 2
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Augmenting with content metadata
Amazon Rekognition
Upload images
with user defined New Object
metadata Created Event PutItem Write
S3 General AWS Lambda Amazon Change Stream Amazon S3 Table bucket
purpose bucket DynamoDB containing managed service contains content
enriched for Apache Flink metadata
events
S3 Table bucket Athena Queries user defined
contains S3 Metadata metadata and content metadata
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Roche
S3 Metadata accelerates our generative AI initiatives.
As we build LLM applications such as internal
chatbots for our teams, unstructured data like PDFs
are becoming increasingly valuable.
Yannick Misteli
Pharma Commercial Head of Engineering,
Roche
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Includes system metadata, advanced
fields like object tags and user-defined
metadata
Preview Dec. 3 2024
S3 Metadata Tables Stored in an Iceberg table in your
Amazon S3 table bucket
Automatic metadata generation, accessible
with simple SQL semantics
Metadata is generated in near real-time
as you create data in a bucket, becomes
queryable in minutes
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jeff Barr’s blog
Additional
resources Documentation
Pricing page
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you! Please complete the session
survey in the mobile app
Hiren Chandiramani Huey Han
[email protected] [email protected] © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.