This repository provides templates and code to establish a comprehensive demo environment that showcases data integration from SAP S/4HANA to Snowflake. The solution enables you to create dataflows from a SAP S/4HANA fully activated appliance, copy data to your own Snowflake database, model the data for transformation into a data mart, and implement dynamic tables for automated refreshes.
The extraction layer demonstrates multiple methods to access SAP data, including standard SAP extractors (using 0FI_AR_4 for accounts receivable as an example), custom views on core tables (demonstrated with the KNA1 customer table), and direct access to ABAP CDS views (exemplified with I_MATERIAL). For transformation, the solution leverages large language models (LLMs) to automatically scan tables and build business-friendly semantic definitions, including intelligent detection of table relationships and join patterns.
Snowflake dynamic tables provide a declarative approach to transformation, allowing you to write SQL transform statements with configurable refresh intervals (such as 5-minute cycles) while Snowflake handles the change data capture process automatically. Once your data is processed and structured, it becomes readily available for consumption through Snowflake Cortex or AWS AI services, enabling advanced analytics and artificial intelligence applications with minimal additional configuration.
-
AWS Account with permissions to create and manage:
- AWS Glue resources (jobs, connections, development endpoints)
- IAM roles and policies
- Amazon S3 buckets
- AWS Secrets Manager secrets
- CloudWatch logs
-
SAP System with:
- OData services enabled
- Appropriate user credentials with read permissions
-
Snowflake Account with:
- Database and schema created
- Warehouse configured
- User with appropriate privileges
-
Development Tools:
- AWS CLI installed and configured
- Python 3.7 or higher
- Git client
Use following options to activate an S/4 HANA demo system or bring your own system which has connectivity established with AWS Account:
- SAP Cloud Application Library
- AWS Launch wizard (Create deployment - SAP NetWeaver on SAP HANA system single instance deployment 4-5 hours, use indefinitely). To order the appliance for download, check SAP Note 2041140.
You also need to obtain the SAP Test and Demo User, starter package (TD_7016852) from the SAP partner pricing app. If you do not install the licenses, either the CAL or on-premise system, your system will stop running after 30-90 days.
Prepare data sources (CDS views, tables, BW extractors) for initial and delta consumption by the ODP ODATA API v2. In SAP GUI, use the following three methods to activate the extraction of data through the ODP ODATA api:
- BW extractor for Accounts Receivables (0FI_AR_4)
- View on Table for Customers (KNA1)
- CDS View for Material (I_Material)
Go to Snowflake.com, select Start for Free and follow the steps to create your account.
- Login to AWS Console
- Configure AWS Glue to publicly signed S/4HANA URL - blog, and adapt it to AWS glue, not AppFlow. In the step, Allow principals to access VPC Endpoint Services, add the principal glue.amazonaws.com and not appflow.amazonaws.com.
For sample SAP data extraction we are using SAP BW extractors (0FI_AR_4 is for accounts receivable), SAP table (KNA1 customer table), and ABAP CDS views (I_MATERIAL).
To use AWS Glue with SAP S/4HANA (S4H)
- Run IAM_Roles CloudFormation template that creats necessary AWSGlueServiceRole and Amazon S3 buckets
- Manually create S4H connction with AWS Glue
- Run SAP to AWS data extraction CloudFormation template that creates respective ETL jobs in AWS Glue. Please note you will need to run this CloudFormation template separately for each source SAP OData entity. Subsequent code uses FI/AR, Customers and Porducts related data but you can adapt to any dataset you like.
- Run the AWS Glue ETL job(s).
- Check the extracted SAP data files landed in the S3 bucket.
Use the [Snowflake SQL worksheet] (/assets/setup.sql)) to:
- Activate a secure integration between AWS and Snowflake.
- Generate table definitions automatically through metadata.
- Pipe data from the S3 bucket into Snowflake automatically.
On the transformation semantics, use large language modules (LLMs) to scan the tables and build up the business language automatically. It even joins the tables!
After that, just another step to trigger the transforms--Snowflake dynamic tables are a declarative way to do this--write your SQL transform statement, what lag refresh you want, say 5 minutes, and Snowflake handles the change data capture for you!
After this, it's just a quick step to start using Snowflake Cortex, or AWS AI tools!
- Open an SQL worksheet in Snowflake.
- Copy and paste the following code into it.
- Review the links in the code on creating an integration between Snowflake and AWS S3 to obtain your STORAGE_AWS_ROLE_ARN and STORAGE_AWS_EXTERNAL_ID.
- Run the code, line by line to set up the integrations and environment.
Open the SAP_PREP_GOLD notebook and execute each cell. You can verify the output of each one.
Use a Snowflake notebook (SAP_PREP_GOLD.ipynb) to automatically:
- Build business semantics.
- Build a reporting mart.
- Optimze query performance.
-
Verify S3 bucket and ETL script
aws s3 ls s3://$BUCKET_NAME/scripts/sap_to_snowflake.py
-
Check IAM role configuration
aws iam get-role --role-name GlueETLRole-SAP-Snowflake --query 'Role.[RoleName, Arn]' --output text
-
Validate AWS Secrets Manager
aws secretsmanager describe-secret --secret-id sap-snowflake-credentials --query 'ARN' --output text
-
Test AWS Glue connections
aws glue test-connection --name sap-connection aws glue test-connection --name snowflake-connection
-
Verify AWS Glue job setup
aws glue get-job --job-name sap-to-snowflake-etl --query 'Job.[Name, Role, Command.ScriptLocation]' --output text
-
Run test job and capture status
JOB_RUN_ID=$(aws glue start-job-run --job-name sap-to-snowflake-etl --arguments '{"--limit":"10"}' --query 'JobRunId' --output text) aws glue get-job-run --job-name sap-to-snowflake-etl --run-id $JOB_RUN_ID --query 'JobRun.JobRunState' --output text
-
Check CloudWatch logs for errors
aws logs get-log-events --log-group-name "/aws-glue/jobs/output" --log-stream-name $JOB_RUN_ID --limit 5
-
Verify data in Snowflake (Connect to Snowflake and run):
SELECT COUNT(*) FROM YOUR_SNOWFLAKE_DB.YOUR_SNOWFLAKE_SCHEMA.YOUR_SNOWFLAKE_TABLE;
-
Generate quick validation report
echo "Deployment Status:" &&
echo "S3: $(aws s3 ls s3://$BUCKET_NAME &>/dev/null && echo '✅' || echo '❌')" &&
echo "IAM: $(aws iam get-role --role-name GlueETLRole-SAP-Snowflake &>/dev/null && echo '✅' || echo '❌')" &&
echo "Job: $(aws glue get-job --job-name sap-to-snowflake-etl &>/dev/null && echo '✅' || echo '❌')" &&
echo "Last Run: $(aws glue get-job-run --job-name sap-to-snowflake-etl --run-id $JOB_RUN_ID --query JobRun.JobRunState --output text)"
SAP to Snowflake AWS Infrastructure Cost Estimate
A comprehensive cost breakdown for building the SAP to Snowflake solution (excluding SAP licensing costs):
Core Infrastructure Components
AWS Glue (Data Processing)
- Pricing: $0.44 per DPU-Hour, billed per second with 1-minute minimum
- Estimated Usage: For daily ETL jobs processing SAP data
- Assuming 2-hour daily job with 10 DPUs: 10 × 2 × $0.44 = $8.80/day
- Monthly Cost: ~$264
Amazon S3 (Data Storage)
- Raw Data Storage: $0.023 per GB/month for Standard storage
- Estimated Usage:
- Initial data load: 100GB-1TB
- Daily incremental: 1-10GB
- Monthly Cost: $50-300 (depending on data volume)
Snowflake Integration Costs
- Data Transfer: Between AWS and Snowflake
- Inter-region transfer: $0.02-0.09 per GB
- Estimated monthly: $50-200
- Snowflake Compute: Based on usage patterns
- Standard edition: Base pricing
- Enterprise/Business Critical: 1.5x-3x higher for enhanced security
Total Monthly Cost Estimate
Small Implementation (Development/Testing)
- AWS Glue: $264
- S3 Storage: $50
- SAP Infrastructure: $500
- Data Transfer: $50
- Total: ~$864/month
Medium Implementation (Production)
- AWS Glue: $500 (more frequent jobs)
- S3 Storage: $150
- SAP Infrastructure: $1,200
- Data Transfer: $100
- Snowflake Integration: $200
- Total: ~$2,150/month
Large Implementation (Enterprise)
- AWS Glue: $1,000+ (multiple daily jobs, larger DPUs)
- S3 Storage: $300
- SAP Infrastructure: $2,000+
- Data Transfer: $200
- Additional AWS services (VPC, CloudWatch, etc.): $100
- Total: ~$3,600+/month
Cost Optimization Recommendations
- Optimize Glue Jobs: Right-size DPUs and minimize runtime
- Implement Data Lifecycle Policies: Move older data to cheaper S3 storage classes
- Monitor Data Transfer: Optimize cross-region transfers to minimize costs
Additional Considerations
- Snowflake Costs: Separate from AWS, based on compute credits and storage
- Network Costs: VPN/Direct Connect for secure connectivity
- Monitoring & Management: CloudWatch, additional operational tools
- Backup & DR: Additional storage and compute for disaster recovery
The actual costs will vary significantly based on:
- Data volume and processing frequency
- SAP system size and complexity
- Snowflake usage patterns
- Regional pricing differences
- Specific security and compliance requirements
| Component | Description | Pricing Model | Small (Dev/Test) | Medium (Production) | Large (Enterprise) |
|---|---|---|---|---|---|
| AWS Glue | Data Processing | $0.44 per DPU-Hour | $264 | $500 | $1,000+ |
| Amazon S3 | Data Storage | $0.023 per GB/month | $50 | $150 | $300 |
| Data Transfer | AWS-Snowflake | $0.02-0.09 per GB | $50 | $100 | $200 |
| Snowflake Integration | Additional Services | Variable | - | $200 | Variable |
| Additional AWS Services | VPC, CloudWatch, etc. | Variable | - | - | $100 |
| TOTAL MONTHLY COST | All Components | - | ~$364 | ~$950 | ~$1,600+ |
- Scale AWS Glue Workers: Adjust --number-of-workers and --worker-type based on data volume (e.g., increase workers for larger datasets)
- Optimize Memory Settings: Modify --conf spark.driver.memory and --conf spark.executor.memory for better performance
- Customize ETL Logic: Adapt sap_to_snowflake.py script to include specific data transformations, filters, or business rules
- Enhanced Security: Add VPC endpoints, implement column-level encryption, or integrate with AWS KMS for additional security
- Data Quality Checks: Include custom validation rules in the ETL script to ensure data integrity
- Monitoring: Set up CloudWatch alarms for job failures, data quality issues, or performance metrics
- Cost Optimization: Implement job bookmarking for incremental loads to reduce processing time and costs
- Error Handling: Add custom error handling and notification mechanisms (e.g., SNS topics for job failures)
- Performance Tuning: Adjust Snowflake warehouse size and implement table clustering based on query patterns
- Scheduling: Modify the default cron expression (cron(0 2 * * ? *)) to match your business requirements
Delete AWS Glue Resources
1. Delete Glue trigger (if created)
aws glue delete-trigger —name daily-sap-to-snowflake
2. Delete Glue job
aws glue delete-job —job-name sap-to-snowflake-etl
3. Delete Glue connections
aws glue delete-connection —connection-name sap-connection
aws glue delete-connection —connection-name snowflake-connection
4. Empty and Delete S3 Bucket
Empty the bucket first (required before deletion)
aws s3 rm s3://$BUCKET_NAME —recursive
5. Delete the bucket
aws s3 rb s3://$BUCKET_NAME
6. Delete AWS Secrets Manager Secret
Delete secret with recovery window
aws secretsmanager delete-secret \
—secret-id sap-snowflake-credentials \
—recovery-window-in-days 7
Or force delete without recovery window
aws secretsmanager delete-secret \
—secret-id sap-snowflake-credentials \
—force-delete-without-recovery
7. Remove IAM Role and Policies
Detach policies
aws iam detach-role-policy —role-name GlueETLRole-SAP-Snowflake \
—policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
8. Delete role
aws iam delete-role —role-name GlueETLRole-SAP-Snowflake
9. Verify Resource Deletion
10. Run validation checks to ensure cleanup
echo "Cleanup Validation:" && \
echo "S3: $(aws s3 ls s3://$BUCKET_NAME 2>&1 | grep -q 'NoSuchBucket' && echo '✅' || echo '❌')" && \
echo "IAM: $(aws iam get-role —role-name GlueETLRole-SAP-Snowflake 2>&1 | grep -q 'NoSuchEntity' && echo '✅' || echo '❌')" && \
echo "Job: $(aws glue get-job —job-name sap-to-snowflake-etl 2>&1 | grep -q 'EntityNotFoundException' && echo '✅' || echo '❌')“
Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.
Ankit Mathur, Abhijeet Jangam

