MENU

Fun & Interesting

Amazon S3 Tables explained: Better storage for AWS Analytics workloads [#126]

DoiT 1,176 lượt xem 3 months ago
Video Not Working? Fix It Now

AWS Analytics expert Swapnil Bhoite breaks down Amazon S3 Tables, AWS's new fully-managed Apache Iceberg solution that's revolutionizing data lake operations.

He discusses:
➡️ How Parquet and Apache Iceberg formats evolved into S3 Tables
➡️ Deep dive into compaction and snapshot management automation
➡️ Understanding namespaces and access control capabilities
➡️ Performance benefits for EMR ETL workloads
➡️ Migration strategies from existing Glue catalog implementations

Watch the full episode to learn about AWS's first cloud object store with built-in Apache Iceberg support and how it's transforming data lake analytics, or read Swapnil's breakdown here: https://medium.com/doit-international/introduction-to-amazon-s3-tables-and-table-buckets-22b12d63fc60

🎙️ Listen on Spotify: https://open.spotify.com/episode/2OlxEFUDceVIft4Auy2rdo?si=7fd9a6578dd94e59
🎙️ Listen on Apple Podcasts: https://podcasts.apple.com/il/podcast/cloud-masters/id1704008075?i=1000686683573

About the guest:

Swapnil is an AWS Big Data Engineer and Amazon Glue SME at DoiT, helping companies architect their data solutions on AWS. Prior to DoiT, he worked at AWS to assist and onboard enterprise customers with the AWS data services like EMR, Redshift, OpenSearch, debugging the issues in their solution and optimizing their end-to-end data solutions with speciality on ETL/ELT pipelines.

Key Moments:

00:00 - Introduction
00:35 - S3 Tables vs. S3 buckets
01:00 - Benefits of Parquet file format
01:31 - Parquet vs. Apache Iceberg
03:33 - How we got to S3 Tables
06:09 - Compaction and snapshot management explained
09:46 - S3 Table namespaces
11:34 - S3 Tables cost and performance benefits
13:27 - EMR ETL performance with S3 Tables
15:47 - Migration considerations from Glue catalog
19:28 - S3 Tables security model and access controls
20:27 - Getting started with S3 Tables
22:34 - Future outlook

#aws #apacheiceberg #dataanalytics #cloudcomputing #datalake #awscloud #dataengineering #cloudnative #bigdata #AmazonS3 #S3Tables #CloudStorage

Comment