Skip to content

Add Emotion Analyzer (new)#527

Merged
k-okada merged 18 commits intojsk-ros-pkg:masterfrom
ayaha-n:emotion_analyzer_new
May 10, 2025
Merged

Add Emotion Analyzer (new)#527
k-okada merged 18 commits intojsk-ros-pkg:masterfrom
ayaha-n:emotion_analyzer_new

Conversation

@ayaha-n
Copy link
Contributor

@ayaha-n ayaha-n commented Apr 14, 2025

@mqcmd196
eye_statusなどが入っていたようで失礼しました.
新しくプルリクし直したので,ご確認いただければ幸いです.

Hume AI を用いたemotion_analyzerを作成しました.
使用方法はreadmeに記載の通りで,

  • text to emotion
  • audio (wav file) to emotion
  • audio (record from /audio) to emotion
    ができます.

ただ,audio (record from /audio) to emotionのときが少し挙動が変で,
ReSpeakerを用いて

roslaunch emotion_analyzer emotion_analyzer.launch api_key:=<api_key>
roslaunch emotion_analyzer capture.launch 
rosservice call /analyze_audio "audio_file: ''"

としたときは

result: "{\"prosody\": null, \"burst\": [{\"name\": \"Admiration\", \"score\": 0.04438357800245285},\
  \ {\"name\": \"Adoration\", \"score\": 0.03663531318306923}, {\"name\": \"Aesthetic\
  \ Appreciation\", \"score\": 0.0275471992790699}, {\"name\": \"Amusement\", \"score\"\
  : 0.15527257323265076}, {\"name\": \"Anger\", \"score\": 0.01609768718481064}, {\"\
  name\": \"Anxiety\", \"score\": 0.09838512539863586}, {\"name\": \"Awe\", \"score\"\
  : 0.0516996867954731}, {\"name\": \"Awkwardness\", \"score\": 0.17967936396598816},\
  \ {\"name\": \"Boredom\", \"score\": 0.18468798696994781}, {\"name\": \"Calmness\"\
  , \"score\": 0.11194272339344025}, {\"name\": \"Concentration\", \"score\": 0.039446331560611725},\
  \ {\"name\": \"Contemplation\", \"score\": 0.07749783992767334}, {\"name\": \"Confusion\"\
  , \"score\": 0.07849768549203873}, {\"name\": \"Contempt\", \"score\": 0.10015680640935898},\
  \ {\"name\": \"Contentment\", \"score\": 0.07816632837057114}, {\"name\": \"Craving\"\
  , \"score\": 0.043796356767416}, {\"name\": \"Determination\", \"score\": 0.022754769772291183},\
  \ {\"name\": \"Disappointment\", \"score\": 0.15245404839515686}, {\"name\": \"\
  Disgust\", \"score\": 0.0362621434032917}, {\"name\": \"Distress\", \"score\": 0.16373476386070251},\
  \ {\"name\": \"Doubt\", \"score\": 0.11785085499286652}, {\"name\": \"Ecstasy\"\
  , \"score\": 0.1383388191461563}, {\"name\": \"Embarrassment\", \"score\": 0.10717830061912537},\
  \ {\"name\": \"Empathic Pain\", \"score\": 0.0768926814198494}, {\"name\": \"Entrancement\"\
  , \"score\": 0.040821533650159836}, {\"name\": \"Envy\", \"score\": 0.021487215533852577},\
  \ {\"name\": \"Excitement\", \"score\": 0.07412480562925339}, {\"name\": \"Fear\"\
  , \"score\": 0.06701570749282837}, {\"name\": \"Guilt\", \"score\": 0.03692568093538284},\
  \ {\"name\": \"Horror\", \"score\": 0.039663150906562805}, {\"name\": \"Interest\"\
  , \"score\": 0.09799767285585403}, {\"name\": \"Joy\", \"score\": 0.13371771574020386},\
  \ {\"name\": \"Love\", \"score\": 0.06643084436655045}, {\"name\": \"Nostalgia\"\
  , \"score\": 0.045610249042510986}, {\"name\": \"Pain\", \"score\": 0.1008995845913887},\
  \ {\"name\": \"Pride\", \"score\": 0.034173380583524704}, {\"name\": \"Realization\"\
  , \"score\": 0.07668226957321167}, {\"name\": \"Relief\", \"score\": 0.10585669428110123},\
  \ {\"name\": \"Romance\", \"score\": 0.0844399556517601}, {\"name\": \"Sadness\"\
  , \"score\": 0.08523412048816681}, {\"name\": \"Satisfaction\", \"score\": 0.2191394865512848},\
  \ {\"name\": \"Desire\", \"score\": 0.14677052199840546}, {\"name\": \"Shame\",\
  \ \"score\": 0.07419771701097488}, {\"name\": \"Surprise (negative)\", \"score\"\
  : 0.020901966840028763}, {\"name\": \"Surprise (positive)\", \"score\": 0.038737643510103226},\
  \ {\"name\": \"Sympathy\", \"score\": 0.042055580765008926}, {\"name\": \"Tiredness\"\
  , \"score\": 0.17382484674453735}, {\"name\": \"Triumph\", \"score\": 0.03647517040371895}]}"

と録音→分析ができているようなのですが,PC内蔵マイクを用いようと

roslaunch emotion_analyzer emotion_analyzer.launch api_key:=<api_key>
roslaunch emotion_analyzer capture.launch device:=hw:0,6 channels:=2 sample_rate:=48000
rosservice call /analyze_audio "audio_file: ''"

のようにすると

result: "{\"prosody\": null, \"burst\": null}"

となってしまって,録音の保存先/home/leus/tmp/hoge.wavを見に行っても,再生できない,もしくは,再生できても私が喋った音声とは異なる雑音のようになる,という感じです.

arecord -lすると

**** List of CAPTURE Hardware Devices ****
card 0: sofhdadsp [sof-hda-dsp], device 0: HDA Analog (*) []
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: sofhdadsp [sof-hda-dsp], device 1: HDA Digital (*) []
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: sofhdadsp [sof-hda-dsp], device 6: DMIC (*) []
  Subdevices: 0/1
  Subdevice #0: subdevice #0
card 0: sofhdadsp [sof-hda-dsp], device 7: DMIC16kHz (*) []
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 1: ArrayUAC10 [ReSpeaker 4 Mic Array (UAC1.0)], device 0: USB Audio [USB Audio]
  Subdevices: 0/1
  Subdevice #0: subdevice #0

のような感じになって,(1,0),(0,6)以外は((0,6)もたまに)capture.launchすると

... logging to /home/leus/.ros/log/603feb96-1919-11f0-a47f-6f63903075d1/roslaunch-leus-ThinkPad-P16s-Gen-2-18937.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://leus-ThinkPad-P16s-Gen-2:40087/

SUMMARY
========

PARAMETERS
 * /audio_capture/bitrate: 128
 * /audio_capture/channels: 2
 * /audio_capture/depth: 16
 * /audio_capture/device: hw:0,7
 * /audio_capture/dst: appsink
 * /audio_capture/format: wave
 * /audio_capture/sample_format: S16LE
 * /audio_capture/sample_rate: 48000
 * /rosdistro: noetic
 * /rosversion: 1.16.0

NODES
  /
    audio_capture (audio_capture/audio_capture)

ROS_MASTER_URI=http://localhost:11311

process[audio_capture-1]: started with pid [18966]
[ERROR] [1744625746.458648274]: gstreamer: Internal data stream error.
[audio_capture-1] process has died [pid 18966, exit code 1, cmd /opt/ros/noetic/lib/audio_capture/audio_capture audio:=audio __name:=audio_capture __log:=/home/leus/.ros/log/603feb96-1919-11f0-a47f-6f63903075d1/audio_capture-1.log].
log file: /home/leus/.ros/log/603feb96-1919-11f0-a47f-6f63903075d1/audio_capture-1*.log
all processes on machine have died, roslaunch will exit
shutting down processing monitor...
... shutting down processing monitor complete
done

のようにエラーが出ます.

@mqcmd196 mqcmd196 mentioned this pull request Apr 15, 2025
Copy link
Member

@mqcmd196 mqcmd196 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature.

  1. Please check ayaha-n#3 and merge if you don't have any problems.
  2. Please check and fix your code following the comments in each line.

@mqcmd196
Copy link
Member

Waiting for some rosdep keys to be merged

@mqcmd196
Copy link
Member

@a-ichikura @sawada10
Please check this specification meets your usage.

@ayaha-n
Copy link
Contributor Author

ayaha-n commented Apr 16, 2025

@iory @mqcmd196 Thanks for your comments. I fixed it:

  • delete unnecessary comments
  • translate comments from Japanese to English
  • check audio format
  • warn if there is no audio data

I further have to do these:

  • check if all the comments are written in English
  • change README to inform that you need to set the audio format: roslaunch audio_capture capture.launch format:=wave
  • check the sample launch again
  • check the case using ReSpeaker

@mqcmd196
Copy link
Member

I'm sorry but I wrongly edited rosdep key. The patch would fix the issue

@sawada10
Copy link
Contributor

@a-ichikura @sawada10 Please check if this specification meets your needs.

Sorry for the late reply, and thank you for all the maintenance work.

@ayaha-n As you mentioned during the today's labo meeting, is the current issue that when voice information is used as input, the analysis is based on the audio from two seconds earlier (i.e., it's not real-time)?

@a-ichikura and I used it during the mamoru experiment, but since we were using text data at the time, I don't think that specific use case is particularly relevant here.
Looking ahead, I’m vaguely thinking it would be useful if the robot could analyze the state of the person it’s talking to and adjust its conversation or interaction style accordingly.
Since it’s unlikely that someone’s emotions would change drastically every two seconds, I think the current implementation is still quite usable.

@ayaha-n
Copy link
Contributor Author

ayaha-n commented Apr 22, 2025

@sawada10 Thanks for your reply.

@ayaha-n As you mentioned during today's lab meeting, is the current issue that when voice information is used as input, the analysis is based on the audio from two seconds earlier (i.e., it's not real-time)?

Yes, I thought it would be better if it starts analyzing audio after the request, but when you want to analyze audio from a microphone, it would be unlikely to have only 2 seconds of audio. In this case, streaming style will be used, so the present implementation analyzing the audio after the request AND 2 seconds before the request is fine.

@mqcmd196
Copy link
Member

@sawada10 @a-ichikura
Finally, please ensure that this package meets the requirements of your application. If you need any new feature, please send the PR to this repository and enhance emotion_analyzer package.

@a-ichikura
Copy link
Contributor

I checked the text_to_emotion function, then it exactly meets our demand.
Thank you for developing!

Copy link
Member

@mqcmd196 mqcmd196 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI green. LGTM

@mqcmd196 mqcmd196 requested a review from k-okada May 1, 2025 15:28
@k-okada k-okada merged commit 3bd763d into jsk-ros-pkg:master May 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants