-
Notifications
You must be signed in to change notification settings - Fork 338
Description
Feature Request: Enhanced Multimedia Support for Audio and Video Files
Summary
I would like to propose adding native support for sending audio and video files through PyWhatKit, extending the current image-only multimedia capabilities to include a broader range of media formats.
Motivation
Currently, PyWhatKit supports sending images but lacks native functionality for audio and video files. This limitation requires developers to implement custom solutions using additional libraries and complex workarounds. Adding native multimedia support would:
- Enhance user experience by supporting modern communication needs
- Simplify development by providing built-in multimedia handling
- Improve reliability through standardized media file processing
- Expand use cases for automation and bulk messaging applications
Proposed Implementation
I have successfully implemented this functionality in my project WhatsAppBlitz using the following approach:
Core Libraries Used:
-
OpenCV (
opencv-python-headless==4.10.0.84) - For computer vision and UI element detection -
PyAutoGUI (
pyautogui==0.9.54) - For automated GUI interactions and file attachment -
Pillow (
Pillow==11.2.1) - For image processing and optimization -
phonenumbers (
phonenumbers==9.0.5) - For phone number validation -
Cryptography (
cryptography==39.0.1) - For secure file handling
Supported Formats:
Audio Files:
.mp3- MPEG Audio Layer 3.wav- Waveform Audio File Format.m4a- MPEG-4 Audio.ogg- Ogg Vorbis.aac- Advanced Audio Coding
Video Files:
.mp4- MPEG-4 Video.avi- Audio Video Interleave.mov- QuickTime Movie.mkv- Matroska Video.webm- WebM Video
Technical Approach
1. Computer Vision-Based Button Detection
# Using OpenCV for template matching
def detect_attachment_button(template_path, threshold=0.8):
"""
Detect the attachment button using template matching
Args:
template_path (str): Path to the button template image
threshold (float): Matching threshold (0-1)
Returns:
list: List of (x, y) coordinates of detected matches
"""
screenshot = pyautogui.screenshot()
screenshot_cv = cv2.cvtColor(np.array(screenshot), cv2.COLOR_RGB2BGR)
template = cv2.imread(template_path, cv2.IMREAD_COLOR)
result = cv2.matchTemplate(screenshot_cv, template, cv2.TM_CCOEFF_NORMED)
locations = np.where(result >= threshold)
return locations2. Robust File Attachment System
def _attach_media(file_path, media_type='auto'):
"""
Attach audio or video files with intelligent retry mechanism
Args:
file_path (str): Path to the media file
media_type (str): Type of media ('audio', 'video', 'auto')
Returns:
bool: True if attachment successful, False otherwise
"""
max_attempts = 3
for attempt in range(max_attempts):
try:
# Detect and click attachment button
if detect_and_click_attachment_button():
# Select file through file dialog
time.sleep(1)
pyautogui.write(file_path)
pyautogui.press('enter')
# Wait for upload and verify
if wait_for_upload_completion():
return True
except Exception as e:
logger.warning(f"Attempt {attempt + 1} failed: {e}")
time.sleep(2)
return False3. File Validation and Optimization
def validate_media_file(file_path, max_size_mb=100):
"""
Validate media file format and size
Args:
file_path (str): Path to the media file
max_size_mb (int): Maximum allowed file size in MB
Returns:
bool: True if valid, False otherwise
"""
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path}")
file_size = os.path.getsize(file_path) / (1024 * 1024) # MB
if file_size > max_size_mb:
raise ValueError(f"File too large: {file_size:.1f}MB > {max_size_mb}MB")
extension = Path(file_path).suffix.lower()
if extension not in SUPPORTED_FORMATS:
raise ValueError(f"Unsupported format: {extension}")
return TrueProposed API Design
New Functions:
# Audio support
pywhatkit.sendwhatmsg_audio(
phone_no="+1234567890",
audio_path="path/to/audio.mp3",
message="Check out this audio!",
time_hour=14,
time_min=30
)
# Video support
pywhatkit.sendwhatmsg_video(
phone_no="+1234567890",
video_path="path/to/video.mp4",
message="Here's the video you requested",
time_hour=15,
time_min=45
)
# Generic multimedia support
pywhatkit.sendwhatmsg_media(
phone_no="+1234567890",
media_path="path/to/file.mp4",
message="Multimedia message",
time_hour=16,
time_min=0,
media_type="auto" # auto-detect or specify: 'audio', 'video', 'image'
)Enhanced Configuration:
# New settings for multimedia handling
pywhatkit.configure_media(
max_file_size_mb=100,
upload_timeout=60,
retry_attempts=3,
supported_audio_formats=['.mp3', '.wav', '.m4a', '.ogg', '.aac'],
supported_video_formats=['.mp4', '.avi', '.mov', '.mkv', '.webm']
)Benefits of This Implementation
1. Reliability
- Computer vision-based button detection works across different WhatsApp themes
- Intelligent retry mechanism handles temporary UI issues
- Comprehensive error handling and logging
2. Flexibility
- Support for multiple audio and video formats
- Configurable file size limits and timeouts
- Auto-detection of media types
3. Performance
- Optimized file validation before upload
- Efficient template matching algorithms
- Minimal resource usage with headless OpenCV
4. User Experience
- Simple, intuitive API similar to existing PyWhatKit functions
- Detailed error messages and logging
- Progress tracking for large file uploads
Implementation Considerations
Dependencies:
# Additional requirements for multimedia support
opencv-python-headless>=4.10.0
pyautogui>=0.9.54
numpy>=1.24.0Platform Compatibility:
- Windows: Full support with Win32 APIs
- macOS: Compatible with Cocoa frameworks
- Linux: X11/Wayland support through PyAutoGUI
WhatsApp Web Compatibility:
- Template-based button detection adapts to UI changes
- Support for both light and dark themes
- Responsive to WhatsApp Web updates
Example Use Cases
Business Applications:
- Customer support: Send instructional videos
- Marketing: Distribute promotional audio content
- Education: Share lecture recordings and presentations
Personal Use:
- Family sharing: Send vacation videos and voice messages
- Content creators: Distribute multimedia content
- Event coordination: Share audio announcements
Backward Compatibility
This enhancement maintains full backward compatibility:
- Existing image functions remain unchanged
- New multimedia functions use similar naming conventions
- Optional dependencies don't affect core functionality
References
- OpenCV Documentation: Template Matching
I believe this enhancement would significantly improve PyWhatKit's capabilities and would be happy to collaborate on its implementation. Please let me know if you'd like to see a proof of concept or have any questions about the technical approach.