-
Notifications
You must be signed in to change notification settings - Fork 1
Getting Started
The Virtual Human Toolkit (VHToolkit) consists of a Unity application that includes characters, nonverbal behavior generation, and nonverbal behavior realization, together with integrations to audio-visual sensing, speech recognition, natural language processing, and text-to-speech services. It comes with two sample Unity projects, for each of its main rendering pipelines:
- URP (Universal Rendering Pipeline): lower fidelity graphics that run on all platforms, including mobile and the web
- HDRP (High Definition Rendering Pipeline): higher fidelity graphics that mainly run on desktops
Choose whichever project best fits your hardware platform. Both projects are feature par, only differing in the Unity rendering pipeline used. For details on downloading and installing Unity, see below.
The VHToolkit uses AI Cloud services. The provided executable release can be run directly out of the box. For the Unity projects, you'll need to create and enter your own API keys in the configuration file. Most of these service offer a free tier. For the sample project, the main required services are:
- Azure Speech, for speech recognition. In the Azure console, create a Speech resource and create an API key from the API Access menu.
- ChatGPT, for natural language processing. In the Organization Settings of the API Platform, select the API Keys menu to create your key.
- AWS Polly, for text-to-speech synthesis. In the AWS Console, create an AWS Polly resource, and create an access key and a secret key.
To add your own API keys to the VHToolkit, in Unity:
- Use the Debug menu at the top left and go to the Config sub menu
-
- Click Open Folder Location and open the ride.json file
- Add your keys to the appropriate sections:
- AzureSpeech
- OpenAIChatGPT
- AWSPolly
- Save the file
- Restart the Unity project
The VHToolkit is powered by the RIDE platform, which is distributed through packages. These packages can be used independently for Unity-based AI development, with native integrations with AWS, Azure, OpenAI, and Stability AI, among others.
The VHToolkit Unity sample projects use the following packages:
- RIDE.Cognition: contains interfaces, implementations, and samples for audio-visual sensing, speech recognition, natural language processing and text-to-speech
- RIDE.VH: contains the interface, implementation, and sample for nonverbal behavior generation
Both packages automatically pull in dependent packages, including RIDE.Abstract and RIDE.Core, which contain the main RIDE API and implementations for core functionality, including logging, configuration, web service interfaces, etc.
- Unity version: 6000.1.5f1
- Unity Editor system requirements
- Development environment:
- Main development OS: Windows
- IDE: MS Visual Studio 2022
- Supporting tools: Git; required for certain Unity packages
- Lip sync generation for pre-recorded audio: FaceFX (requires separately license)
- Download and install the Unity Hub
- Run the Unity Hub and log in with a Unity account
- Note there's no need to download a Unity version yet
- Choose a project folder per the preferred rendering pipeline, either VHUnityHDRP or VHUnityURP
- In the main project folder, run 'runUnity.bat' for Windows, or 'runUnity.sh' for MacOS
- Open and Play the main scene, Assets/Scenes/SampleScene
- Guidance for use of the scene is displayed in the initial Overview menu of the Debug Menu in the upper left of the screen.
-
- Advance to the Main panel and click Character and select from characters made in-house by the ICT, or a sub-set of characters from the Microsoft Rocketbox Avatar library.
-
- Use the various capabilities buttons to switch and compare different cloud services.
- The default Build Profile is for Windows, and includes only the SampleScene. VH character asset bundles will download at runtime, similar to when playing in-editor.
- Note, if generating a local standalone build, initial compile may take 1.5+ hours due to HDRP/URP shader compilation. The subsequent iterative compile times will be much quicker.