0% found this document useful (0 votes)
27 views28 pages

STG366 NEW - Unlock The Power of Your Data With Amazon S3 Metadata

The document discusses the challenges of data discovery and the importance of metadata in managing large datasets stored in Amazon S3. It introduces S3 Metadata, which allows for automatic metadata generation and querying using SQL, enhancing the ability to find actionable datasets. The presentation also highlights use cases and benefits of S3 Metadata for organizations like SmugMug and Roche in leveraging unstructured data for AI initiatives.

Uploaded by

Ramkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views28 pages

STG366 NEW - Unlock The Power of Your Data With Amazon S3 Metadata

The document discusses the challenges of data discovery and the importance of metadata in managing large datasets stored in Amazon S3. It introduces S3 Metadata, which allows for automatic metadata generation and querying using SQL, enhancing the ability to find actionable datasets. The presentation also highlights use cases and benefits of S3 Metadata for organizations like SmugMug and Roche in leveraging unstructured data for AI initiatives.

Uploaded by

Ramkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

STG366

Unlock the power of your data


with Amazon S3 Metadata

Hiren Chandiramani Huey Han


He/Him He/Him
Principal Worldwide Storage Specialist Principal Product Manager
Amazon Web Services Amazon Web Services

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda

01 Data discovery challenges

02 S3 Metadata overview

03 Demo

04 Summary

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data discovery challenges

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data is growing
faster than ever
S3 holds more than 400 trillion objects

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Geospatial or lunar imagery Internet of Things (IoT) sensor data

Customer call-center records Medical images and records


Analytics Mobile sync and storage Digital record preservation

Mobile sync and storage


Compliance records
Data lakes
Media master files
Home video recordings Pharmaceutical study data

Model checkpoints Seismic and reservoir simulation data


DNA sequences
Amazon S3 Backups Website hosting
Surveillance video/closed-circuit television

Log files Machine learning training data


Media assets
User-generated content Financial records Autonomous vehicle data

Meteorological and
Oil and gas topography
environmental research

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
New value from ML/AI training

UNSTRUCTURED DATA
TRAIN CUSTOMIZE VALUE

Text generation

Summarization

Unstructured & Info extraction


Structured
Gen AI Data Q&A

Chatbot

STRUCTURED DATA Business insights generation

Amazon S3 Gen AI unlocks new business value from


unstructured data
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to find
actionable
datasets at
scale?

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Metadata is key

Accurate metadata Metadata is only as Metadata stores live


makes it easier to find useful as it is current outside of storage, and
data sets are difficult to build,
operate, and scale

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing S3 Metadata

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Includes system metadata, advanced
fields like object tags and user-defined
metadata
Preview Dec. 3 2024

S3 Metadata Tables Stored in an Iceberg table in your


Amazon S3 table bucket
Automatic metadata generation, accessible
with simple SQL semantics

Metadata is generated in near real-time


as you create data in a bucket, becomes
queryable in minutes

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How it works

A multitude of data
sources Your bucket Your table bucket

Configure S3 Queryable
Metadata on metadata table
your bucket

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Journal table

Change events to the bucket


AWS-managed table
Read-only for non-AWS principals
It’s a “system table” for your bucket

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Metadata Data Type
bucket String
key String
sequence_number String

21 system metadata fields


record_type String
record_timestamp Timestamp
version_id
is_delete_marker
String
Boolean
automatically recorded for all
size Long new objects in your bucket
last_modified_date Timestamp
e_tag
storage_class
String
String
Object size
is_multipart Boolean Storage class
encryption_status String Create date
is_bucket_key_enabled Boolean
kms_key_arn String
Client information
checksum_algorithm String Encryption information
object_tags
user_metadata
Map <String, String>
Map <String, String>

requester String
source_ip_address String
request_id String

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Queryable using a wide range of tools

Interactive Big data Data ML-powered Open source


query processing warehousing business intelligence tools

Amazon Amazon Amazon Amazon


Athena EMR Redshift QuickSight Apache Spark

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lake Formation – Fine-grained access control

Amazon Athena
AWS Glue

AWS Lake
Formation Amazon Redshift

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SmugMug/Flickr

Imagine flying a time machine through your Amazon


S3 data. At SmugMug and Flickr, we've stored over
22 years of our customers' photos, hundreds of
billions of objects, in S3. The new S3 Metadata
feature helps us to easily explore our S3 object
metadata easily and affordably.

Andrew Shieh
Principal Engineer,
SmugMug

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Custom metadata

Object tags User-defined Joined tables

Mutable Mutable
Immutable
Dedicated APIs Fully customized
Only added on PUT
Dedicated permissions Detached from objects
No cost
Paid Amazon S3 feature Cost of table storage

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Video generated via Amazon Bedrock is
automatically annotated with custom metadata

In Amazon Bedrock’s PUT request:


Amazon x-amz-meta-content-source: AmazonBedrock
Bedrock
x-amz-meta-content-model-id: arn:aws:::model/xyz-v1

Query from Amazon S3 Metadata:


Journal SELECT bucket, key, user_defined_metadata, row_type
table FROM aws_s3_metadata.my_metadata_table
WHERE user_defined_metadata[‘content-source’] = AmazonBedrock

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo – Part 2

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Augmenting with content metadata

Amazon Rekognition

Upload images
with user defined New Object
metadata Created Event PutItem Write

S3 General AWS Lambda Amazon Change Stream Amazon S3 Table bucket


purpose bucket DynamoDB containing managed service contains content
enriched for Apache Flink metadata
events

S3 Table bucket Athena Queries user defined


contains S3 Metadata metadata and content metadata

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Roche

S3 Metadata accelerates our generative AI initiatives.


As we build LLM applications such as internal
chatbots for our teams, unstructured data like PDFs
are becoming increasingly valuable.

Yannick Misteli
Pharma Commercial Head of Engineering,
Roche

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Includes system metadata, advanced
fields like object tags and user-defined
metadata
Preview Dec. 3 2024

S3 Metadata Tables Stored in an Iceberg table in your


Amazon S3 table bucket
Automatic metadata generation, accessible
with simple SQL semantics

Metadata is generated in near real-time


as you create data in a bucket, becomes
queryable in minutes

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jeff Barr’s blog
Additional
resources Documentation

Pricing page

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you! Please complete the session
survey in the mobile app

Hiren Chandiramani Huey Han


[email protected] [email protected]

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

You might also like