Skip to content

NonVerbal Behavior Generator (NVBG)

Arno Hartholt edited this page Dec 16, 2025 · 4 revisions

Purpose

Generate nonverbal behavior (e.g., select conversational gestures) based on character text and realize this behavior in Unity.

Organization

RideSystemsVH/NonverbalBehaviorGeneratorSystem

Approach

Nonverbal Behavior Generation

Nonverbal behavior is generated by the NVBG module (NonVerbal Behavior Generator). The NVBG is a rule-based system that takes character speech as text and creates a behavior schedule following the Behavior Markup Language (BML). The schedule includes head nods/shakes, gestures, and facial expressions, timed to the words spoken by the character. This schedule is based on both syntactic data obtained from a natural language parser and semantic data defined in rules.

An example BML message can be seen below. It consists of:

  • Participant ID: ID of the virtual human character. This needs to match character Prefab in Unity
  • Speech ID: ID of the speech. This is either dynamically generated or references a file on disk. If the latter, the ID needs to match for all four types of files:
    • Audio: .ogg
    • Transcript: .txt
    • BML: .xml.txt
    • Lipsync schedule: .bml.txt
  • Marks: tags surrounding each word of the character speech (e.g., T0, T1, etc.). These are used to refer to when a behavior needs to be triggered.
  • Event messages: ActiveMQ messages that are sent out in a distributed messaging version of a VH system (e.g., Windows). Not used on iOS. Messages used in this BML are:
    • vrAgentSpeech partial: real-time message of what the VH has said so far
    • vrSpoke: sent when the VH has spoken the entire utterance
  • Head: head nods and gazes. These are procedurally generated, driven by the Amount and Repeats parameters. The timing is based on the reference to the Speech ID and Mark Name.
  • Animation: which gesture animation to play. The timing is based on the reference to the Speech ID and Mark Name.
<?xml version="1.0" encoding="utf-8"?>
<act>
  <participant id="ChrKevin" role="actor" />
  <bml>
	<speech id="BB_BE_100" ref="BB_BE_100" type="application/ssml+xml">
  	<mark name="T0" />Got
    <mark name="T1" /><mark name="T2" />it?
    <mark name="T3" /><mark name="T4" />Now
    <mark name="T5" /><mark name="T6" />it's
    <mark name="T7" /><mark name="T8" />your
    <mark name="T9" /><mark name="T10" />turn.
    <mark name="T11" /><mark name="T12" />I'll
    <mark name="T13" /><mark name="T14" />talk
    <mark name="T15" /><mark name="T16" />you
    <mark name="T17" /><mark name="T18" />through
    <mark name="T19" /><mark name="T20" />it.
    <mark name="T21" /></speech>
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T1 Got " stroke="BB_BE_100:T1" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T3 Got it? " stroke="BB_BE_100:T3" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T5 Got it? Now " stroke="BB_BE_100:T5" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T7 Got it? Now it's " stroke="BB_BE_100:T7" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T9 Got it? Now it's your " stroke="BB_BE_100:T9" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T11 Got it? Now it's your turn. " stroke="BB_BE_100:T11" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T13 Got it? Now it's your turn. I'll " stroke="BB_BE_100:T13" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T15 Got it? Now it's your turn. I'll talk " stroke="BB_BE_100:T15" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T17 Got it? Now it's your turn. I'll talk you " stroke="BB_BE_100:T17" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T19 Got it? Now it's your turn. I'll talk you through " stroke="BB_BE_100:T19" />
	<event message="vrAgentSpeech partial VirtualCoachPerson1-20180131165329 T21 Got it? Now it's your turn. I'll talk you through it. " stroke="BB_BE_100:T21" />
	<gaze participant="ChrKevin" target="user" direction="POLAR 0" angle="0" start="sp1:T0" joint-range="HEAD EYES" xmlns:sbm="http://ict.usc.edu" />
	<event message="vrSpoke ChrKevin user VirtualCoachPerson1-20180131165329 Got it? Now it's your turn. I'll talk you through it." stroke="BB_BE_100:relax" xmlns:sbm="http://ict.usc.edu" />
	<!--First noun clause nod-->
	<head type="NOD" amount="0.10" repeats="1.0" relax="BB_BE_100:T3" priority="5" />
	<!--Noun clause nod-->
	<head type="NOD" amount="0.10" repeats="1.0" relax="BB_BE_100:T9" priority="5" />
	<!--You Animation-->
	<animation stroke="BB_BE_100:T8" priority="4" name="IdleStandingUpright01_YouLf01" />
	<!--First noun clause nod-->
	<head type="NOD" amount="0.10" repeats="1.0" relax="BB_BE_100:T17" priority="5" />
	<!--You Animation-->
	<animation stroke="BB_BE_100:T16" priority="4" name="IdleStandingUpright01_YouLf01" />
	<!--Noun clause nod-->
	<head type="NOD" amount="0.10" repeats="1.0" relax="BB_BE_100:T21" priority="5" />
  </bml>
</act>

The rules to generate this behavior can be found and edited in \nvbg\nvbg\data\nvbg-toolkit\rule_input_ChrKevin.xml. For example, the following rule indicates that whenever the character says “I” or “me”, it can use one animation when standing, or choose between two animations when sitting:

<rule keyword="me_animation" priority="4" >
    <pattern>i</pattern>
    <pattern>me</pattern>    
    <animation>
   	 <posture name="ChrGenericMleAdult@IdleStandingUpright01">
   		 <clip>IdleStandingUpright01_MeLf01</clip>
   	 </posture>
   	 <posture name="ChrGenericMleAdult@IdleSittingUpright01">
   		 <clip>IdleSittingUpright01_SelfMedLf01</clip>
   		 <clip>IdleSittingUpright01_SingleTapSmRt01</clip>
   	 </posture>
    </animation>
</rule>

The NVBG can either be run as a separate stand-alone module, which is used for offline generation of behaviors on Windows, or embedded within Unity, which is used for real-time generation when interfacing with a cloud NLP AI service on iOS.

For more details on the NVBG, see here or search the main VHToolkit website.

Nonverbal Behavior Realization

The BML schedule described above is interpreted and realized in real-time in Unity. The VH character is set up as a Unity humanoid character.

This character prefab has the following main components:

  • Animator: animation state machine, where individual gestures are grouped by posture (e.g., IdleStandingUpright01). These postures match the ones defined in the NVBG rules. As such, the system needs to know which posture a VH is in, to then select and play from the appropriate suite of gestures. The animator is layered, with facial expressions and mouth shapes defined in the higher layers.
  • MecanimCharacter: main script that executes VH-related functions, including playing speech and gestures.
  • GazeController_IK: controls character gaze. Targets include any Unity GameObject and can be restricted to eyes, neck, and spine.
  • SaccadeController: controls rapid eye movements. Patterns can be set to Listen, Talk and Think.
  • FacialAnimationPlayer_Animator: controls facial animation.
  • HeadController: controls head nods and shakes.
  • BlinkController: controls blinking
  • BMLEventHandler: parses BML.
  • ListeningController: controls listening behavior, primarily head nods based on audio pauses when user speaks.
  • MirroringController: controls mirroring behavior, primarily matching character facial expression based on detected facial expression of the user.

Limitations

Generated nonverbal behavior schedule is fairly basic and typically requires a human pass with the Timeline editor tools.

Known Issues

N/A

Clone this wiki locally