- Overview
- High-level Workflow
- Architecture
- Prerequisites
- Deployment and Validation Steps
- License
- Next Steps
- Cleanup
- Notices
- Authors
This repository contains documentation for a Spatial Computing/GenAI prototype solution that was presented at Amazon's re:Invent 2023 conference. It includes the following major components within the source directory:
- A CDK-deployable project [see CDK.json, lib directory, and bin directory]
- Lambda code for basic GenAI-based ASL translation logic and request/message passing [see functions directory]
- Sample code for a CloudFront-based web interface (for users to speak or type phrases to be translated) [see frontend directory]
- An Unreal Engine 5.3 C++ project (displays ASL translations performed by a realistic 3D avatar) [see ue directory]
Please monitor this repository for future updates as more code and documentation becomes available.
This sample code/prototype demonstrates a novel way to pre-process/transform multilingual phrases into an equivalent literal (or direct) form for translation into American Sign Language (ASL). This pre-processing step improves sign language translation fidelity by expressing user-provided (input) phrases more clearly than they were initially expressed. GenAI is applied to re-interpret these multilingual input phrases into simpler, more explicit English phrases across multiple iterations/passes. These resulting phases have a different (more charitable) kind of interpretation versus the resulting phrases produced by traditional non-GenAI-based translation tools. Finally, this prototype animates a realistic avatar in Unreal Engine (via MetaHuman plugin) to visually depict the ASL translation of those resulting phrases. ASL translation in this prototype is based on a very loose/naïve interpretation of ASL rules and grammar, primarily involving hand and arm movements - all of which end users can refine. The main goals of this project are to essentially improve the translation of existing robust ASL translation engines (via GenAI), and to provide an engaging multimodal interface to view ASL translations.
translation-example.mp4
- An end user speaks (or types) a phrase in a spoken language of choice
- That spoken phrase is transcribed directly
- The transcribed phrase is translated (comprehended) via Generative AI into English, which is then simplified across multiple iterations using carefully crafted Bedrock prompts
- An Avatar in Unreal Engine software animates ASL gestures ("signs") corresponding to the simplified transcription
- User authenticates to Amazon Cognito using an Amazon CloudFront-hosted website or web API (through Amazon Cognito-based JWT access token).
- User types or speaks an input phrase in a chosen language, which Amazon Transcribe transcribes. Transcription is stored in an Amazon Simple Storage Service (Amazon S3) bucket.
- User requests an action (like ASL translate, change avatar, or change background image) through the website or web API (Amazon API Gateway endpoint).
- Based on the user-requested action, API Gateway routes its request to a corresponding AWS Lambda function for processing of that action.
- For ASL Translation requests, a matching AWS Lambda function invokes Amazon Bedrock API to form an ASL phrase for the provided input phrase and obtain a contextual 2D image (to be stored in an S3 bucket).
- Amazon Comprehend and Amazon Bedrock perform multilingual toxicity checks on the input phrase. Amazon Rekognition performs visual toxicity checks on 2D-generated images. Toxicity check results are returned to respective Lambda functions.
- All Lambda functions generate a JSON-based payload to capture a user-requested action
for Epic Games Unreal Engine. Each payload is sent to a
corresponding Amazon Simple Notification Service (Amazon SNS) topic:
Translation
orNon-Translation
. - Each Amazon SNS-based payload is transmitted to its corresponding Amazon Simple Queue Service (Amazon SQS) queue for later consumption by Unreal Engine.
- Using the AWS SDK, the Unreal Engine application polls and dequeues Amazon SQS action-based payloads from its queues. Background images are fetched from an S3 bucket for translation requests.
- Based on each payload received, the Unreal Engine application performs a user-requested action and displays resulting video output on that user’s system. This output provides an ASL-equivalent interpretation of an input phrase by displaying a MetaHuman 3D avatar animation with ASL-transformed text displayed.
AWS Service | Role | |
---|---|---|
Amazon Transcribe | Core | Convert user speech to text. |
Amazon Bedrock | Core | Invoke foundation model to translate natural language to ASL. |
Amazon API Gateway | Core | Create API to invoke lambda functions from user interface. |
AWS Lambda | Core | Run custom code to generate ASL for simplified text. |
Amazon Cognito | Core | Authenticate user to access ASL translator |
Amazon Comprehend | Core | Run moderation to detect toxicity on generated text |
Amazon Rekognition | Core | Run moderation to detect toxicity on generated image |
Amazon CloudFront | Core | Fast and secure web-hosted user experience |
Amazon Simple Storage Service (S3) | Core | Host user interface code, store generated images |
Amazon Simple Notification Service (SNS) | Core | Send the notification to Unreal Engine |
Amazon Simple Queue Service (SQS) | Core | Queue notifications for Unreal Engine to consume |
You are responsible for the cost of the AWS services used while running this Guidance.
The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the
us-east-1
(N. Virginia) Region for one month. This estimate is based on
the AWS Pricing Calculator output for
the full deployment of this Guidance. As of February 2025, an average cost of running this Guidance in the us-east-1
is around $1718/month:
AWS service | Dimensions | Cost [USD] |
---|---|---|
Amazon Transcribe | 5,000 requests, 34 KB request size | <$1/month |
Amazon Bedrock | 10 users | <$1/month |
Amazon API Gateway | 5,000 requests, 128 MB memory allocation, 25 s duration | <$1/month |
AWS Lambda (event trigger) | 5,000 requests, 128 MB memory allocation, 5 s duration | <$1/month |
Amazon Cognito | 1 GB storage | <$2/month |
Amazon Comprehend | 10 GB standard storage | <$1/month |
Amazon Rekognition | 200 input / 300 output tokens per request (5,000 requests) | $44/month |
Amazon S3 | 200 input / 300 output tokens per request (5,000 requests) | $26/month |
Amazon SNS | 730 hours x 1.125 USD/hour | $821/month |
Amazon SQS | 730 hours x 1.125 USD/hour | $821/month |
TOTAL | $1718/month |
Verify that your environment satisfies the following prerequisites:
- Contains an AWS account
- Has
AdministratorAccess
policy granted to your AWS account (for production, we recommend restricting access as needed and following the principle of least privilege) - Provides console and programmatic access
- AWS CLI installed and configured to use with your AWS account
- NodeJS 22+ installed
- Typescript 3.8+ installed
- AWS CDK CLI installed
- Docker installed
- Python 3+ installed
You must explicitly enable access to models before they can be used with the Amazon Bedrock service. Please follow these steps in the Amazon Bedrock User Guide to enable access to the models used in this solution:
- stability.stable-diffusion-xl
- anthropic.claude-v2
Your AWS account has default quotas, also known as service limits, described here. This Guidance can be installed and tested within the default quotas for each of the services used. You can request increases for some quotas. Note that not all quotas can be increased.
To operate this Guidance at scale, it is important to monitor your usage of AWS services and configure alarm settings to notify you when a quota is close to being exceeded. You can find details on visualizing your service quotas and setting alarms here.
This Guidance uses the Amazon Bedrock service, which is not currently available in all AWS regions. You must launch this solution in an AWS region where Amazon Bedrock is available. For the most current availability of AWS services by Region, refer to the AWS Regional Services List.
American Sign Language (ASL) 3D Avatar Translator on AWS is supported in the following AWS regions (as of Feb 2025):
Region Name | |
---|---|
US East (Ohio) | us-east-2 |
Asia Pacific (Seoul) | ap-northeast-2 |
US East (N. Virginia) | us-east-1 |
Europe (Paris) | eu-west-3 |
This project is built using Cloud Development Kit (CDK). Please see Getting Started With the AWS CDK for additional details and prerequisites.
For detailed instructions on deployment and validation of this Guidance please refer to this Implementation Guide
Overall, this early-stage prototype illustrates an end-to-end workflow for an 'approximated' American Sign Language (ASL) translation via 3D Avatar. Ideally, generated pre-ASL (comprehended) output phrases should be forwarded to a robust ASL processing engine, which would then generate corresponding avatar animation data to be supplied to Unreal Engine.
Suggested future enhancements:
- LiveLink capability
- Mechanism to dynamically feed external animation data into UE to control individual skeleton joints per animation frame (versus a playing limited selection of pre-built UE animations)
- Animation Montage with Blending
- Make ASL signing animations more natural/less abrupt/smooth, easier to decipher sentences, allow for multiple animations at once across more parts of the skeleton rig (i.e. face, hands etc).
- Animations of additional body components (i.e. face)
- Extend architecture and translation capability to support other sign language variants
- Design/re-architect to support ASL translation processing performed outside of UE
- Integrate lambda-based ASL "pre-processor" logic with a robust external ASL engine/API, compare before/after output.
- Application integration
- Stream ASL animations to major chat programs, perform batch offline processing of videos
Do not forget to delete the Guidance stack to avoid unexpected charges. A sample command to uninstall it is shown below:
$ cdk destroy AwsAslCdkStack
Then in the AWS Console, delete the Amazon CloudWatch logs, empty and delete the S3 buckets.
This library is licensed under the MIT-0 License. Please see the LICENSE file.
Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.
We would like to acknowledge the contributions of these editors and reviewers.
- Alexa Perlov, Prototyping Architect
- Alain Krok, Sr Prototyping Architect
- Daniel Zilberman, Sr Solutions Architect - Tech Solutions
- David Israel, Sr Spatial Architect
- Dinesh Sajwan, Sr Prototyping Architect
- Michael Tran, Sr Prototyping Architect