Skip to content

aws-solutions-library-samples/guidance-for-american-sign-language-3d-avatar-translator-on-aws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Guidance for American Sign Language (ASL) 3D Avatar Translator on AWS

Table of Contents

  1. Overview
  2. High-level Workflow
  3. Architecture
  4. Prerequisites
  5. Deployment and Validation Steps
  6. License
  7. Next Steps
  8. Cleanup
  9. Notices
  10. Authors

This repository contains documentation for a Spatial Computing/GenAI prototype solution that was presented at Amazon's re:Invent 2023 conference. It includes the following major components within the source directory:

  • A CDK-deployable project [see CDK.json, lib directory, and bin directory]
  • Lambda code for basic GenAI-based ASL translation logic and request/message passing [see functions directory]
  • Sample code for a CloudFront-based web interface (for users to speak or type phrases to be translated) [see frontend directory]
  • An Unreal Engine 5.3 C++ project (displays ASL translations performed by a realistic 3D avatar) [see ue directory]

Please monitor this repository for future updates as more code and documentation becomes available.

Overview

This sample code/prototype demonstrates a novel way to pre-process/transform multilingual phrases into an equivalent literal (or direct) form for translation into American Sign Language (ASL). This pre-processing step improves sign language translation fidelity by expressing user-provided (input) phrases more clearly than they were initially expressed. GenAI is applied to re-interpret these multilingual input phrases into simpler, more explicit English phrases across multiple iterations/passes. These resulting phases have a different (more charitable) kind of interpretation versus the resulting phrases produced by traditional non-GenAI-based translation tools. Finally, this prototype animates a realistic avatar in Unreal Engine (via MetaHuman plugin) to visually depict the ASL translation of those resulting phrases. ASL translation in this prototype is based on a very loose/naïve interpretation of ASL rules and grammar, primarily involving hand and arm movements - all of which end users can refine. The main goals of this project are to essentially improve the translation of existing robust ASL translation engines (via GenAI), and to provide an engaging multimodal interface to view ASL translations.

Alt text

translation-example.mp4

High-level Workflow

  1. An end user speaks (or types) a phrase in a spoken language of choice
  2. That spoken phrase is transcribed directly
  3. The transcribed phrase is translated (comprehended) via Generative AI into English, which is then simplified across multiple iterations using carefully crafted Bedrock prompts
  4. An Avatar in Unreal Engine software animates ASL gestures ("signs") corresponding to the simplified transcription

Architecture

Reference Architecture

Architecture Diagram Workflow

  1. User authenticates to Amazon Cognito using an Amazon CloudFront-hosted website or web API (through Amazon Cognito-based JWT access token).
  2. User types or speaks an input phrase in a chosen language, which Amazon Transcribe transcribes. Transcription is stored in an Amazon Simple Storage Service (Amazon S3) bucket.
  3. User requests an action (like ASL translate, change avatar, or change background image) through the website or web API (Amazon API Gateway endpoint).
  4. Based on the user-requested action, API Gateway routes its request to a corresponding AWS Lambda function for processing of that action.
  5. For ASL Translation requests, a matching AWS Lambda function invokes Amazon Bedrock API to form an ASL phrase for the provided input phrase and obtain a contextual 2D image (to be stored in an S3 bucket).
  6. Amazon Comprehend and Amazon Bedrock perform multilingual toxicity checks on the input phrase. Amazon Rekognition performs visual toxicity checks on 2D-generated images. Toxicity check results are returned to respective Lambda functions.
  7. All Lambda functions generate a JSON-based payload to capture a user-requested action for Epic Games Unreal Engine. Each payload is sent to a corresponding Amazon Simple Notification Service (Amazon SNS) topic: Translation or Non-Translation.
  8. Each Amazon SNS-based payload is transmitted to its corresponding Amazon Simple Queue Service (Amazon SQS) queue for later consumption by Unreal Engine.
  9. Using the AWS SDK, the Unreal Engine application polls and dequeues Amazon SQS action-based payloads from its queues. Background images are fetched from an S3 bucket for translation requests.
  10. Based on each payload received, the Unreal Engine application performs a user-requested action and displays resulting video output on that user’s system. This output provides an ASL-equivalent interpretation of an input phrase by displaying a MetaHuman 3D avatar animation with ASL-transformed text displayed.

AWS Services in this Guidance

AWS Service Role
Amazon Transcribe Core Convert user speech to text.
Amazon Bedrock Core Invoke foundation model to translate natural language to ASL.
Amazon API Gateway Core Create API to invoke lambda functions from user interface.
AWS Lambda Core Run custom code to generate ASL for simplified text.
Amazon Cognito Core Authenticate user to access ASL translator
Amazon Comprehend Core Run moderation to detect toxicity on generated text
Amazon Rekognition Core Run moderation to detect toxicity on generated image
Amazon CloudFront Core Fast and secure web-hosted user experience
Amazon Simple Storage Service (S3) Core Host user interface code, store generated images
Amazon Simple Notification Service (SNS) Core Send the notification to Unreal Engine
Amazon Simple Queue Service (SQS) Core Queue notifications for Unreal Engine to consume

Cost

You are responsible for the cost of the AWS services used while running this Guidance.

Sample Cost Table

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the us-east-1 (N. Virginia) Region for one month. This estimate is based on the AWS Pricing Calculator output for the full deployment of this Guidance. As of February 2025, an average cost of running this Guidance in the us-east-1 is around $1718/month:

AWS service Dimensions Cost [USD]
Amazon Transcribe 5,000 requests, 34 KB request size <$1/month
Amazon Bedrock 10 users <$1/month
Amazon API Gateway 5,000 requests, 128 MB memory allocation, 25 s duration <$1/month
AWS Lambda (event trigger) 5,000 requests, 128 MB memory allocation, 5 s duration <$1/month
Amazon Cognito 1 GB storage <$2/month
Amazon Comprehend 10 GB standard storage <$1/month
Amazon Rekognition 200 input / 300 output tokens per request (5,000 requests) $44/month
Amazon S3 200 input / 300 output tokens per request (5,000 requests) $26/month
Amazon SNS 730 hours x 1.125 USD/hour $821/month
Amazon SQS 730 hours x 1.125 USD/hour $821/month
TOTAL $1718/month

Prerequisites

Verify that your environment satisfies the following prerequisites:

  1. Contains an AWS account
  2. Has AdministratorAccess policy granted to your AWS account (for production, we recommend restricting access as needed and following the principle of least privilege)
  3. Provides console and programmatic access
  4. AWS CLI installed and configured to use with your AWS account
  5. NodeJS 22+ installed
  6. Typescript 3.8+ installed
  7. AWS CDK CLI installed
  8. Docker installed
  9. Python 3+ installed

You must explicitly enable access to models before they can be used with the Amazon Bedrock service. Please follow these steps in the Amazon Bedrock User Guide to enable access to the models used in this solution:

  • stability.stable-diffusion-xl
  • anthropic.claude-v2

Service Quotas

Your AWS account has default quotas, also known as service limits, described here. This Guidance can be installed and tested within the default quotas for each of the services used. You can request increases for some quotas. Note that not all quotas can be increased.

To operate this Guidance at scale, it is important to monitor your usage of AWS services and configure alarm settings to notify you when a quota is close to being exceeded. You can find details on visualizing your service quotas and setting alarms here.

Supported AWS Regions

This Guidance uses the Amazon Bedrock service, which is not currently available in all AWS regions. You must launch this solution in an AWS region where Amazon Bedrock is available. For the most current availability of AWS services by Region, refer to the AWS Regional Services List.

American Sign Language (ASL) 3D Avatar Translator on AWS is supported in the following AWS regions (as of Feb 2025):

Region Name
US East (Ohio) us-east-2
Asia Pacific (Seoul) ap-northeast-2
US East (N. Virginia) us-east-1
Europe (Paris) eu-west-3

Deployment and Validation Steps

Deployment

This project is built using Cloud Development Kit (CDK). Please see Getting Started With the AWS CDK for additional details and prerequisites.

For detailed instructions on deployment and validation of this Guidance please refer to this Implementation Guide

Next Steps

Overall, this early-stage prototype illustrates an end-to-end workflow for an 'approximated' American Sign Language (ASL) translation via 3D Avatar. Ideally, generated pre-ASL (comprehended) output phrases should be forwarded to a robust ASL processing engine, which would then generate corresponding avatar animation data to be supplied to Unreal Engine.

Suggested future enhancements:

  • LiveLink capability
    • Mechanism to dynamically feed external animation data into UE to control individual skeleton joints per animation frame (versus a playing limited selection of pre-built UE animations)
  • Animation Montage with Blending
    • Make ASL signing animations more natural/less abrupt/smooth, easier to decipher sentences, allow for multiple animations at once across more parts of the skeleton rig (i.e. face, hands etc).
  • Animations of additional body components (i.e. face)
  • Extend architecture and translation capability to support other sign language variants
  • Design/re-architect to support ASL translation processing performed outside of UE
  • Integrate lambda-based ASL "pre-processor" logic with a robust external ASL engine/API, compare before/after output.
  • Application integration
    • Stream ASL animations to major chat programs, perform batch offline processing of videos

Cleanup

Do not forget to delete the Guidance stack to avoid unexpected charges. A sample command to uninstall it is shown below:

    $ cdk destroy AwsAslCdkStack

Then in the AWS Console, delete the Amazon CloudWatch logs, empty and delete the S3 buckets.

License

This library is licensed under the MIT-0 License. Please see the LICENSE file.

Notices

Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

Authors

We would like to acknowledge the contributions of these editors and reviewers.

  • Alexa Perlov, Prototyping Architect
  • Alain Krok, Sr Prototyping Architect
  • Daniel Zilberman, Sr Solutions Architect - Tech Solutions
  • David Israel, Sr Spatial Architect
  • Dinesh Sajwan, Sr Prototyping Architect
  • Michael Tran, Sr Prototyping Architect

About

Solutions Guidance for American Sign Language 3D Avatar Translator on AWS

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7