aws-glue-data-catalog

Here are 27 public repositories matching this topic...

j3-signalroom / apache_flink-kickstarter

Examples of Apache Flink® v2.1 applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.

Updated Jan 13, 2026
Java

aws-samples / automated-datastore-discovery-with-aws-glue

Star

Automation framework to catalog AWS data sources using Glue

aws typescript aws-s3 dynamodb glue python3 data-catalog rds gdpr pii data-governance aws-cdk aws-glue-workflow aws-glue-crawler aws-glue-data-catalog

Updated Apr 10, 2025
Python

DivineSamOfficial / SmartCityProject

Star

Smart City Realtime Data Engineering Project

python aws kafka aws-s3 pyspark spark-streaming aws-ec2 aws-athena aws-redshift aws-glue aws-quicksight aws-glue-crawler aws-glue-data-catalog

Updated May 24, 2024
Python

shiv-rna / Youtube-Data-Engineering-Pipeline

Star

This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.

aws youtube aws-lambda aws-s3 aws-cli data-engineering aws-iam aws-athena aws-glue data-engineering-pipeline aws-quicksight aws-glue-data-catalog

Updated Mar 20, 2024
Python

BahBosque / delta-to-iceberg-aws-glue

Star

Tool to migrate Delta Lake tables to Apache Iceberg using AWS Glue and S3

open-source aws spark data-lake migration-tool apache-iceberg delta-lake aws-glue-data-catalog

Updated May 22, 2025

ShubhamMohanty680 / Spotify_end_to_end_data_engineering

Star

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.

python aws aws-lambda aws-s3 spotify-api data-engineering aws-athena data-engineering-pipeline spotipy-library aws-glue-crawler awscloudwatch aws-glue-data-catalog aws-trigger

Updated Jan 22, 2025
Jupyter Notebook

j3-signalroom / supercharge_streamlit-apache_flink

Star

Engaging, interactive visualizations crafted with Streamlit, seamlessly powered by Apache Flink in batch mode to reveal deep insights from data.

kafka apache-flink flink iceberg apache-iceberg flink-sql streamlit streamlit-dashboard pyflink aws-glue-data-catalog

Updated Dec 1, 2024
Python

ablange / aws-data-lake

Star

Prototype of AWS data lake reference implementation written in Python and Spark: https://aws.amazon.com/solutions/implementations/data-lake-solution/

python aws sql spark aws-s3 aws-sns aws-cloudformation aws-dynamodb aws-athena aws-lambda-python aws-glue aws-glue-data-catalog

Updated Apr 13, 2025
Python

subhamay-cloudworks / 0090-deutzia-cft

Sponsor

Star

Creating an audit table for a DynamoDB table using CloudTrail, Kinesis Data Stream, Lambda, S3, Glue and Athena and CloudFormation

aws-python-lambda aws-iam aws-cloudformation aws-cloudtrail aws-cloudwatch aws-athena aws-cloudwatch-logs aws-kinesis-stream aws-glue-crawler aws-iam-roles aws-iam-policies aws-s3-bucket aws-glue-data-catalog

Updated Jul 6, 2023
Python

subhamay-cloudworks / 0052-agapanthus-cft

Sponsor

Star

Working with Glue Data Catalog and Running the Glue Crawler On Demand

aws-cloudformation aws-glue aws-glue-crawler aws-iam-roles aws-iam-policies aws-glue-data-catalog

Updated May 11, 2023

wyang10 / AWS-Serverless-ELT-Pipeline-Enterprise

Star

Enterprise track: Step Functions/EventBridge + Glue + data quality on top of the v1 serverless ELT

Updated Jan 3, 2026
Python

SadafAsad / LinkedIn-Jobs-Analysis

Star

Unveiling job market trends with Scrapy and AWS

python aws-s3 scrapy aws-ec2 aws-athena aws-quicksight aws-glue-crawler aws-glue-data-catalog

Updated Apr 5, 2024
Python

j3-signalroom / ccaf-tableflow-aws_glue-snowflake-kickstarter

Star

This project demonstrates how to use Terraform to enable Tableflow in Kafka to generate and store the Iceberg Table files in an AWS S3 bucket. Then, configure Snowflake to read the Iceberg Tables using AWS Glue Data Catalog and the AWS S3 bucket where Tableflow produces the Iceberg files.

snowflake amazon-s3 confluent-kafka confluent-cloud aws-glue-data-catalog confluent-flink confluent-tableflow

Updated Jan 13, 2026
HCL

ev2900 / Iceberg_Glue_register_table

Star

Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog

aws glue iceberg aws-glue apache-iceberg aws-glue-data-catalog

Updated Feb 5, 2026
Python

raditpasy25 / AWS-Serverless-ELT-Pipeline

Star

🌟 Build a production-lite serverless ELT pipeline on AWS, enabling efficient data ingestion and transformation from S3 to Parquet with minimal overhead.

Updated Feb 7, 2026
Python

ShreyasLengade / serverless_etl_pipeline

Star

Developed an ETL pipeline for real-time ingestion of stock market data from the stock-market-data-manage.onrender.com API. Engineered the system to store data in Parquet format for optimized query processing and incorporated data quality checks to ensure accuracy prior to visualization.

aws-lambda aws-s3 data-engineering aws-kinesis aws-glue data-engineering-pipeline aws-glue-crawler aws-grafana aws-glue-data-catalog

Updated Jun 25, 2024
Python

harika-majji / aws-stock-market-analysis

Star

python kafka aws-s3 aws-ec2 aws-glue-crawler aws-glue-data-catalog

Updated Apr 1, 2025
Jupyter Notebook

joaovnovais / terraform-aws-data-infra

Star

This project uses Terraform and GitHub Actions to build and validate a data infrastructure on AWS. The CI pipeline automates code verification for provisioning an S3 bucket and a Glue catalog, establishing a solid, version-controlled foundation for data engineering projects.

git ubuntu aws-s3 ci-cd github-actions wsl2 iac-terraform aws-glue-data-catalog

Updated Oct 19, 2025
HCL

joaovnovais / pipeline_AWSGlue_PySpark

Star

This project showcases a complete data engineering pipeline on AWS, following best practices in data ingestion, transformation, and analytics — ready for real-world production use or integration with BI tools such as QuickSight or Power BI.

aws-s3 pyspark aws-athena aws-glue-crawler aws-glue-data-catalog

Updated Oct 18, 2025

deept-agl / Youtube-data-ETL-Analysis-using-AWS

Star

This project creates a scalable data pipeline to analyze YouTube data from Kaggle using AWS services: S3, Glue, Lambda, Athena, and QuickSight. It processes raw JSON and CSV files into cleansed, partitioned datasets, integrates them with ETL workflows, and catalogs data for querying. Final insights are visualized in QuickSight dashboards.

aws-lambda athena aws-s3 aws-glue quicksight aws-glue-data-catalog

Updated Jan 25, 2025
Python

Improve this page

Add a description, image, and links to the aws-glue-data-catalog topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the aws-glue-data-catalog topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-glue-data-catalog

Here are 27 public repositories matching this topic...

j3-signalroom / apache_flink-kickstarter

aws-samples / automated-datastore-discovery-with-aws-glue

DivineSamOfficial / SmartCityProject

shiv-rna / Youtube-Data-Engineering-Pipeline

BahBosque / delta-to-iceberg-aws-glue

ShubhamMohanty680 / Spotify_end_to_end_data_engineering

j3-signalroom / supercharge_streamlit-apache_flink

ablange / aws-data-lake

subhamay-cloudworks / 0090-deutzia-cft

subhamay-cloudworks / 0052-agapanthus-cft

wyang10 / AWS-Serverless-ELT-Pipeline-Enterprise

SadafAsad / LinkedIn-Jobs-Analysis

j3-signalroom / ccaf-tableflow-aws_glue-snowflake-kickstarter

ev2900 / Iceberg_Glue_register_table

raditpasy25 / AWS-Serverless-ELT-Pipeline

ShreyasLengade / serverless_etl_pipeline

harika-majji / aws-stock-market-analysis

joaovnovais / terraform-aws-data-infra

joaovnovais / pipeline_AWSGlue_PySpark

deept-agl / Youtube-data-ETL-Analysis-using-AWS

Improve this page

Add this topic to your repo