Skip to content

This Guidance demonstrates how to implement a SAP Cloud Lakehouse using Snowflake on AWS, addressing the challenge of integrating SAP and non-SAP data for comprehensive analytics through step-by-step implementation instructions.

License

Notifications You must be signed in to change notification settings

aws-solutions-library-samples/guidance-for-integrating-sap-and-non-sap-data-using-snowflake-on-aws

Guidance for Integrating SAP and Non-SAP Data using Snowflake on AWS

This repository provides templates and code to establish a comprehensive demo environment that showcases data integration from SAP S/4HANA to Snowflake. The solution enables you to create dataflows from a SAP S/4HANA fully activated appliance, copy data to your own Snowflake database, model the data for transformation into a data mart, and implement dynamic tables for automated refreshes.

The extraction layer demonstrates multiple methods to access SAP data, including standard SAP extractors (using 0FI_AR_4 for accounts receivable as an example), custom views on core tables (demonstrated with the KNA1 customer table), and direct access to ABAP CDS views (exemplified with I_MATERIAL). For transformation, the solution leverages large language models (LLMs) to automatically scan tables and build business-friendly semantic definitions, including intelligent detection of table relationships and join patterns.

Snowflake dynamic tables provide a declarative approach to transformation, allowing you to write SQL transform statements with configurable refresh intervals (such as 5-minute cycles) while Snowflake handles the change data capture process automatically. Once your data is processed and structured, it becomes readily available for consumption through Snowflake Cortex or AWS AI services, enabling advanced analytics and artificial intelligence applications with minimal additional configuration.

Reference Architecture

SAP to Snowflake data integration options

Guidance for building SAP Cloud Lakehouse using Snowflake on AWS

Snowflake Cloud LakeHouse on AWS

Guidance for building SAP Cloud Lakehouse using Snowflake on AWS

Prerequisites

  • AWS Account with permissions to create and manage:

    • AWS Glue resources (jobs, connections, development endpoints)
    • IAM roles and policies
    • Amazon S3 buckets
    • AWS Secrets Manager secrets
    • CloudWatch logs
  • SAP System with:

    • OData services enabled
    • Appropriate user credentials with read permissions
  • Snowflake Account with:

    • Database and schema created
    • Warehouse configured
    • User with appropriate privileges
  • Development Tools:

    • AWS CLI installed and configured
    • Python 3.7 or higher
    • Git client

Step-by-step guide for running the Solution Guidance

Step 1 - Setup SAP S/4HANA

Use following options to activate an S/4 HANA demo system or bring your own system which has connectivity established with AWS Account:

You also need to obtain the SAP Test and Demo User, starter package (TD_7016852) from the SAP partner pricing app. If you do not install the licenses, either the CAL or on-premise system, your system will stop running after 30-90 days.

Prepare data sources (CDS views, tables, BW extractors) for initial and delta consumption by the ODP ODATA API v2. In SAP GUI, use the following three methods to activate the extraction of data through the ODP ODATA api:

Step 2 - Set up Snowflake

Go to Snowflake.com, select Start for Free and follow the steps to create your account.

Step 3 - Extract SAP data to Amazon S3 bucket using AWS Glue SAP OData Connector

  1. Login to AWS Console
  2. Configure AWS Glue to publicly signed S/4HANA URL - blog, and adapt it to AWS glue, not AppFlow. In the step, Allow principals to access VPC Endpoint Services, add the principal glue.amazonaws.com and not appflow.amazonaws.com.

For sample SAP data extraction we are using SAP BW extractors (0FI_AR_4 is for accounts receivable), SAP table (KNA1 customer table), and ABAP CDS views (I_MATERIAL).

To use AWS Glue with SAP S/4HANA (S4H)

  1. Run IAM_Roles CloudFormation template that creats necessary AWSGlueServiceRole and Amazon S3 buckets
  2. Manually create S4H connction with AWS Glue
  3. Run SAP to AWS data extraction CloudFormation template that creates respective ETL jobs in AWS Glue. Please note you will need to run this CloudFormation template separately for each source SAP OData entity. Subsequent code uses FI/AR, Customers and Porducts related data but you can adapt to any dataset you like.
  4. Run the AWS Glue ETL job(s).
  5. Check the extracted SAP data files landed in the S3 bucket.

Step 4 - Create the RAW layer (Snowflake SQL)

Use the [Snowflake SQL worksheet] (/assets/setup.sql)) to:

  • Activate a secure integration between AWS and Snowflake.
  • Generate table definitions automatically through metadata.
  • Pipe data from the S3 bucket into Snowflake automatically.

On the transformation semantics, use large language modules (LLMs) to scan the tables and build up the business language automatically. It even joins the tables!

After that, just another step to trigger the transforms--Snowflake dynamic tables are a declarative way to do this--write your SQL transform statement, what lag refresh you want, say 5 minutes, and Snowflake handles the change data capture for you!

After this, it's just a quick step to start using Snowflake Cortex, or AWS AI tools!

  1. Open an SQL worksheet in Snowflake.
  2. Copy and paste the following code into it.
  3. Review the links in the code on creating an integration between Snowflake and AWS S3 to obtain your STORAGE_AWS_ROLE_ARN and STORAGE_AWS_EXTERNAL_ID.
  4. Run the code, line by line to set up the integrations and environment.

Step 5 - Create Bronze & Gold layers (Snowflake Notebook)

Open the SAP_PREP_GOLD notebook and execute each cell. You can verify the output of each one.

Use a Snowflake notebook (SAP_PREP_GOLD.ipynb) to automatically:

  • Build business semantics.
  • Build a reporting mart.
  • Optimze query performance.

Deployment Validation

  1. Verify S3 bucket and ETL script

    aws s3 ls s3://$BUCKET_NAME/scripts/sap_to_snowflake.py

  2. Check IAM role configuration

    aws iam get-role --role-name GlueETLRole-SAP-Snowflake --query 'Role.[RoleName, Arn]' --output text

  3. Validate AWS Secrets Manager

    aws secretsmanager describe-secret --secret-id sap-snowflake-credentials --query 'ARN' --output text

  4. Test AWS Glue connections

    aws glue test-connection --name sap-connection aws glue test-connection --name snowflake-connection

  5. Verify AWS Glue job setup

    aws glue get-job --job-name sap-to-snowflake-etl --query 'Job.[Name, Role, Command.ScriptLocation]' --output text

  6. Run test job and capture status

    JOB_RUN_ID=$(aws glue start-job-run --job-name sap-to-snowflake-etl --arguments '{"--limit":"10"}' --query 'JobRunId' --output text) aws glue get-job-run --job-name sap-to-snowflake-etl --run-id $JOB_RUN_ID --query 'JobRun.JobRunState' --output text

  7. Check CloudWatch logs for errors

    aws logs get-log-events --log-group-name "/aws-glue/jobs/output" --log-stream-name $JOB_RUN_ID --limit 5

  8. Verify data in Snowflake (Connect to Snowflake and run):

    SELECT COUNT(*) FROM YOUR_SNOWFLAKE_DB.YOUR_SNOWFLAKE_SCHEMA.YOUR_SNOWFLAKE_TABLE;

  9. Generate quick validation report

    echo "Deployment Status:" &&
    echo "S3: $(aws s3 ls s3://$BUCKET_NAME &>/dev/null && echo '✅' || echo '❌')" &&
    echo "IAM: $(aws iam get-role --role-name GlueETLRole-SAP-Snowflake &>/dev/null && echo '✅' || echo '❌')" &&
    echo "Job: $(aws glue get-job --job-name sap-to-snowflake-etl &>/dev/null && echo '✅' || echo '❌')" &&
    echo "Last Run: $(aws glue get-job-run --job-name sap-to-snowflake-etl --run-id $JOB_RUN_ID --query JobRun.JobRunState --output text)"

Cost Estimates

SAP to Snowflake AWS Infrastructure Cost Estimate

A comprehensive cost breakdown for building the SAP to Snowflake solution (excluding SAP licensing costs):

Core Infrastructure Components

AWS Glue (Data Processing)

  • Pricing: $0.44 per DPU-Hour, billed per second with 1-minute minimum
  • Estimated Usage: For daily ETL jobs processing SAP data
    • Assuming 2-hour daily job with 10 DPUs: 10 × 2 × $0.44 = $8.80/day
    • Monthly Cost: ~$264

Amazon S3 (Data Storage)

  • Raw Data Storage: $0.023 per GB/month for Standard storage
  • Estimated Usage:
    • Initial data load: 100GB-1TB
    • Daily incremental: 1-10GB
    • Monthly Cost: $50-300 (depending on data volume)

Snowflake Integration Costs

  • Data Transfer: Between AWS and Snowflake
    • Inter-region transfer: $0.02-0.09 per GB
    • Estimated monthly: $50-200
  • Snowflake Compute: Based on usage patterns
    • Standard edition: Base pricing
    • Enterprise/Business Critical: 1.5x-3x higher for enhanced security

Total Monthly Cost Estimate

Small Implementation (Development/Testing)

  • AWS Glue: $264
  • S3 Storage: $50
  • SAP Infrastructure: $500
  • Data Transfer: $50
  • Total: ~$864/month

Medium Implementation (Production)

  • AWS Glue: $500 (more frequent jobs)
  • S3 Storage: $150
  • SAP Infrastructure: $1,200
  • Data Transfer: $100
  • Snowflake Integration: $200
  • Total: ~$2,150/month

Large Implementation (Enterprise)

  • AWS Glue: $1,000+ (multiple daily jobs, larger DPUs)
  • S3 Storage: $300
  • SAP Infrastructure: $2,000+
  • Data Transfer: $200
  • Additional AWS services (VPC, CloudWatch, etc.): $100
  • Total: ~$3,600+/month

Cost Optimization Recommendations

  1. Optimize Glue Jobs: Right-size DPUs and minimize runtime
  2. Implement Data Lifecycle Policies: Move older data to cheaper S3 storage classes
  3. Monitor Data Transfer: Optimize cross-region transfers to minimize costs

Additional Considerations

  • Snowflake Costs: Separate from AWS, based on compute credits and storage
  • Network Costs: VPN/Direct Connect for secure connectivity
  • Monitoring & Management: CloudWatch, additional operational tools
  • Backup & DR: Additional storage and compute for disaster recovery

The actual costs will vary significantly based on:

  • Data volume and processing frequency
  • SAP system size and complexity
  • Snowflake usage patterns
  • Regional pricing differences
  • Specific security and compliance requirements
Component Description Pricing Model Small (Dev/Test) Medium (Production) Large (Enterprise)
AWS Glue Data Processing $0.44 per DPU-Hour $264 $500 $1,000+
Amazon S3 Data Storage $0.023 per GB/month $50 $150 $300
Data Transfer AWS-Snowflake $0.02-0.09 per GB $50 $100 $200
Snowflake Integration Additional Services Variable - $200 Variable
Additional AWS Services VPC, CloudWatch, etc. Variable - - $100
TOTAL MONTHLY COST All Components - ~$364 ~$950 ~$1,600+

Next Steps

  1. Scale AWS Glue Workers: Adjust --number-of-workers and --worker-type based on data volume (e.g., increase workers for larger datasets)
  2. Optimize Memory Settings: Modify --conf spark.driver.memory and --conf spark.executor.memory for better performance
  3. Customize ETL Logic: Adapt sap_to_snowflake.py script to include specific data transformations, filters, or business rules
  4. Enhanced Security: Add VPC endpoints, implement column-level encryption, or integrate with AWS KMS for additional security
  5. Data Quality Checks: Include custom validation rules in the ETL script to ensure data integrity
  6. Monitoring: Set up CloudWatch alarms for job failures, data quality issues, or performance metrics
  7. Cost Optimization: Implement job bookmarking for incremental loads to reduce processing time and costs
  8. Error Handling: Add custom error handling and notification mechanisms (e.g., SNS topics for job failures)
  9. Performance Tuning: Adjust Snowflake warehouse size and implement table clustering based on query patterns
  10. Scheduling: Modify the default cron expression (cron(0 2 * * ? *)) to match your business requirements

Cleanup

Delete AWS Glue Resources


1. Delete Glue trigger (if created)

    aws glue delete-trigger —name daily-sap-to-snowflake

2. Delete Glue job

    aws glue delete-job —job-name sap-to-snowflake-etl

3. Delete Glue connections

    aws glue delete-connection —connection-name sap-connection
    aws glue delete-connection —connection-name snowflake-connection

4. Empty and Delete S3 Bucket

    Empty the bucket first (required before deletion)
    aws s3 rm s3://$BUCKET_NAME —recursive

5. Delete the bucket
    
    aws s3 rb s3://$BUCKET_NAME

6. Delete AWS Secrets Manager Secret

     Delete secret with recovery window

    aws secretsmanager delete-secret \
      —secret-id sap-snowflake-credentials \
      —recovery-window-in-days 7

    Or force delete without recovery window

    aws secretsmanager delete-secret \
    —secret-id sap-snowflake-credentials \
      —force-delete-without-recovery

7. Remove IAM Role and Policies

    Detach policies

    aws iam detach-role-policy —role-name GlueETLRole-SAP-Snowflake \
      —policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole

8. Delete role

    aws iam delete-role —role-name GlueETLRole-SAP-Snowflake


9. Verify Resource Deletion


10. Run validation checks to ensure cleanup

    echo "Cleanup Validation:" && \
    echo "S3: $(aws s3 ls s3://$BUCKET_NAME 2>&1 | grep -q 'NoSuchBucket' && echo '✅' || echo '❌')" && \
    echo "IAM: $(aws iam get-role —role-name GlueETLRole-SAP-Snowflake 2>&1 | grep -q 'NoSuchEntity' && echo '✅' || echo '❌')" && \
    echo "Job: $(aws glue get-job —job-name sap-to-snowflake-etl 2>&1 | grep -q 'EntityNotFoundException' && echo '✅' || echo '❌')“

Notices

Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.

Authors

Ankit Mathur, Abhijeet Jangam

About

This Guidance demonstrates how to implement a SAP Cloud Lakehouse using Snowflake on AWS, addressing the challenge of integrating SAP and non-SAP data for comprehensive analytics through step-by-step implementation instructions.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published