Skip to content

[FEATURE]: Triggering picture/other actions by voiceΒ #1183

@wikijm

Description

@wikijm

Is your feature request related to a problem?

No

Description

Today, there is a way to trigger actions by clicking on virtual (touchscreen) and phydsical buttons or by doing API calls.
How about using voice?

Describe the solution you'd like

An offline and on-device wake word detection to trigger actions through API calls

Describe alternatives you've considered

Independant device or install to provide the service.
Maybe some new actions can be added to the Photobooth API.

Additional context

I've made a script to use Porcupine, but so far nothing si directly related to PhotoboothProject:

main.py

import pvporcupine
import pyaudio
import struct
import subprocess
import configparser
import os

# --- CONFIGURATION FROM config.ini ---
config = configparser.ConfigParser()
config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.ini')
config.read(config_path)

#ACCESS_KEY = config['PORCUPINE']['ACCESS_KEY']
KEYWORDS = [k.strip() for k in config['PORCUPINE']['KEYWORDS'].split(',')]
KEYWORD_PATHS = [k.strip() for k in config['PORCUPINE']['KEYWORD_PATHS'].split(',')]
MODEL_PATH = config['PORCUPINE']['MODEL_PATH']
COMMAND_TO_EXECUTE = config['PORCUPINE']['COMMAND_TO_EXECUTE']

    # --- END OF CONFIGURATION ---

try:
    # Initialize Porcupine
    porcupine = pvporcupine.create(
        #access_key=ACCESS_KEY,
        keyword_paths=KEYWORD_PATHS,
        model_path=MODEL_PATH
    )

    # Initialize audio stream with PyAudio
    pa = pyaudio.PyAudio()
    audio_stream = pa.open(
        rate=porcupine.sample_rate,
        channels=1,
        format=pyaudio.paInt16,
        input=True,
        frames_per_buffer=porcupine.frame_length
    )

    print(f"Ready. Listening for keywords: {', '.join(KEYWORDS)}")
    print("Press Ctrl+C to exit.")

    while True:
        pcm = audio_stream.read(porcupine.frame_length)
        pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)

        # Process audio with Porcupine
        keyword_index = porcupine.process(pcm)

        # If a keyword is detected (keyword_index >= 0)
        if keyword_index >= 0:
            detected = KEYWORDS[keyword_index]
            print(f"Keyword '{detected}' detected!")
            
            # --- TRIGGER YOUR ACTION HERE ---
            print(f"Executing command: '{COMMAND_TO_EXECUTE}'")
            subprocess.Popen(COMMAND_TO_EXECUTE.split())
            print("Waiting for the next detection...")


except KeyboardInterrupt:
    print("Script stopped.")
finally:
    if 'porcupine' in locals() and porcupine is not None:
        porcupine.delete()
    if 'audio_stream' in locals() and audio_stream is not None:
        audio_stream.close()
    if 'pa' in locals() and pa is not None:
        pa.terminate()

config.ini

[PORCUPINE]
ACCESS_KEY = 
KEYWORDS = sorbet citron
KEYWORD_PATHS = sorbet-citron_fr_linux_v3_0_0.ppn
MODEL_PATH = porcupine_params_fr.pv
COMMAND_TO_EXECUTE = curl --request GET http://localhost:14711/commands/start-picture

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions