Skip to content

Enhanced Multimedia Support for Audio and Video Files #354

@Javi-CD

Description

@Javi-CD

Feature Request: Enhanced Multimedia Support for Audio and Video Files

Summary

I would like to propose adding native support for sending audio and video files through PyWhatKit, extending the current image-only multimedia capabilities to include a broader range of media formats.

Motivation

Currently, PyWhatKit supports sending images but lacks native functionality for audio and video files. This limitation requires developers to implement custom solutions using additional libraries and complex workarounds. Adding native multimedia support would:

  • Enhance user experience by supporting modern communication needs
  • Simplify development by providing built-in multimedia handling
  • Improve reliability through standardized media file processing
  • Expand use cases for automation and bulk messaging applications

Proposed Implementation

I have successfully implemented this functionality in my project WhatsAppBlitz using the following approach:

Core Libraries Used:

  • OpenCV (opencv-python-headless==4.10.0.84) - For computer vision and UI element detection

  • PyAutoGUI (pyautogui==0.9.54) - For automated GUI interactions and file attachment

  • Pillow (Pillow==11.2.1) - For image processing and optimization

  • phonenumbers (phonenumbers==9.0.5) - For phone number validation

  • Cryptography (cryptography==39.0.1) - For secure file handling

Supported Formats:

Audio Files:

  • .mp3 - MPEG Audio Layer 3
  • .wav - Waveform Audio File Format
  • .m4a - MPEG-4 Audio
  • .ogg - Ogg Vorbis
  • .aac - Advanced Audio Coding

Video Files:

  • .mp4 - MPEG-4 Video
  • .avi - Audio Video Interleave
  • .mov - QuickTime Movie
  • .mkv - Matroska Video
  • .webm - WebM Video

Technical Approach

1. Computer Vision-Based Button Detection

# Using OpenCV for template matching
def detect_attachment_button(template_path, threshold=0.8):
    """
    Detect the attachment button using template matching

    Args:
        template_path (str): Path to the button template image
        threshold (float): Matching threshold (0-1)
    Returns:
        list: List of (x, y) coordinates of detected matches
    """

    screenshot = pyautogui.screenshot()
    screenshot_cv = cv2.cvtColor(np.array(screenshot), cv2.COLOR_RGB2BGR)
    template = cv2.imread(template_path, cv2.IMREAD_COLOR)

    result = cv2.matchTemplate(screenshot_cv, template, cv2.TM_CCOEFF_NORMED)
    locations = np.where(result >= threshold)

    return locations

2. Robust File Attachment System

def _attach_media(file_path, media_type='auto'):
    """
    Attach audio or video files with intelligent retry mechanism

    Args:
        file_path (str): Path to the media file
        media_type (str): Type of media ('audio', 'video', 'auto')
    Returns:
        bool: True if attachment successful, False otherwise
    """
    max_attempts = 3

    for attempt in range(max_attempts):
        try:
            # Detect and click attachment button
            if detect_and_click_attachment_button():
                # Select file through file dialog
                time.sleep(1)
                pyautogui.write(file_path)
                pyautogui.press('enter')

                # Wait for upload and verify
                if wait_for_upload_completion():
                    return True

        except Exception as e:
            logger.warning(f"Attempt {attempt + 1} failed: {e}")
            time.sleep(2)

    return False

3. File Validation and Optimization

def validate_media_file(file_path, max_size_mb=100):
    """
    Validate media file format and size

    Args:
        file_path (str): Path to the media file
        max_size_mb (int): Maximum allowed file size in MB
    Returns:
        bool: True if valid, False otherwise
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")

    file_size = os.path.getsize(file_path) / (1024 * 1024)  # MB
    if file_size > max_size_mb:
        raise ValueError(f"File too large: {file_size:.1f}MB > {max_size_mb}MB")

    extension = Path(file_path).suffix.lower()
    if extension not in SUPPORTED_FORMATS:
        raise ValueError(f"Unsupported format: {extension}")

    return True

Proposed API Design

New Functions:

# Audio support
pywhatkit.sendwhatmsg_audio(
    phone_no="+1234567890",
    audio_path="path/to/audio.mp3",
    message="Check out this audio!",
    time_hour=14,
    time_min=30
)

# Video support
pywhatkit.sendwhatmsg_video(
    phone_no="+1234567890",
    video_path="path/to/video.mp4",
    message="Here's the video you requested",
    time_hour=15,
    time_min=45
)

# Generic multimedia support
pywhatkit.sendwhatmsg_media(
    phone_no="+1234567890",
    media_path="path/to/file.mp4",
    message="Multimedia message",
    time_hour=16,
    time_min=0,
    media_type="auto"  # auto-detect or specify: 'audio', 'video', 'image'
)

Enhanced Configuration:

# New settings for multimedia handling
pywhatkit.configure_media(
    max_file_size_mb=100,
    upload_timeout=60,
    retry_attempts=3,
    supported_audio_formats=['.mp3', '.wav', '.m4a', '.ogg', '.aac'],
    supported_video_formats=['.mp4', '.avi', '.mov', '.mkv', '.webm']
)

Benefits of This Implementation

1. Reliability

  • Computer vision-based button detection works across different WhatsApp themes
  • Intelligent retry mechanism handles temporary UI issues
  • Comprehensive error handling and logging

2. Flexibility

  • Support for multiple audio and video formats
  • Configurable file size limits and timeouts
  • Auto-detection of media types

3. Performance

  • Optimized file validation before upload
  • Efficient template matching algorithms
  • Minimal resource usage with headless OpenCV

4. User Experience

  • Simple, intuitive API similar to existing PyWhatKit functions
  • Detailed error messages and logging
  • Progress tracking for large file uploads

Implementation Considerations

Dependencies:

# Additional requirements for multimedia support
opencv-python-headless>=4.10.0
pyautogui>=0.9.54
numpy>=1.24.0

Platform Compatibility:

  • Windows: Full support with Win32 APIs
  • macOS: Compatible with Cocoa frameworks
  • Linux: X11/Wayland support through PyAutoGUI

WhatsApp Web Compatibility:

  • Template-based button detection adapts to UI changes
  • Support for both light and dark themes
  • Responsive to WhatsApp Web updates

Example Use Cases

Business Applications:

  • Customer support: Send instructional videos
  • Marketing: Distribute promotional audio content
  • Education: Share lecture recordings and presentations

Personal Use:

  • Family sharing: Send vacation videos and voice messages
  • Content creators: Distribute multimedia content
  • Event coordination: Share audio announcements

Backward Compatibility

This enhancement maintains full backward compatibility:

  • Existing image functions remain unchanged
  • New multimedia functions use similar naming conventions
  • Optional dependencies don't affect core functionality

References


I believe this enhancement would significantly improve PyWhatKit's capabilities and would be happy to collaborate on its implementation. Please let me know if you'd like to see a proof of concept or have any questions about the technical approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions