Skip to content

Webhose/webzio-firehose-api-consumer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Webz.io Firehose Consumer

A simple and robust Python client for consuming real-time data from the Webz.io Firehose API with automatic pagination, intelligent error handling, and rate limiting.

Features

  • Real-time data consumption with automatic pagination
  • Intelligent rate limiting - handles HTTP 429 with exponential backoff
  • Comprehensive error handling - network errors, HTTP errors, and timeouts
  • Command-line interface - no code editing required
  • Detailed logging - timestamps, response times, status codes, and URLs
  • Zero posts handling - automatic retry when no new data is available

Quick Start

  1. Install dependencies:
pip install requests
  1. Run the consumer:
python3 simple_consumer.py --token YOUR_TOKEN --firehose YOUR_FIREHOSE_NAME
  1. Stop the consumer: Press Ctrl+C to stop gracefully

Usage Examples

Basic consumption (last 5 minutes)

python3 simple_consumer.py --token abc123xyz --firehose news_feed

Start from 10 minutes ago

python3 simple_consumer.py --token abc123xyz --firehose news_feed --start-minutes 10

Show all available options

python3 simple_consumer.py --help

Command Line Parameters

Parameter Required Default Description
--token Yes - Your Webz.io API token (provided by Webz.io team)
--firehose Yes - Your firehose name (provided by Webz.io team)
--start-minutes No 5 Start consuming from X minutes ago

Output Format

The script provides detailed real-time logs for each API request:

Successful requests

[2025-07-20 11:10:46] Request #250: 100 posts | Status: 200 | Response time: 0.09s | API since: 07-20 11:08:37 | Total posts: 20256
  URL: https://api.webz.io/firehose?token=...&since=1752998917000&nid=...
  → Posts found, continuing to next page...

Zero posts (waiting for new data)

[2025-07-20 11:10:49] Request #251: 0 posts | Status: 200 | Response time: 0.03s | API since: 07-20 11:08:40 | Total posts: 20256
  URL: https://api.webz.io/firehose?token=...&since=1752998920000&nid=...
  → No posts found, sleeping 2 seconds...

Rate limiting (HTTP 429)

[2025-07-20 11:10:52] Request #252: RATE LIMIT ERROR | Status: 429 | Response time: 0.02s
  → Waiting 10 seconds due to rate limit...

Error Handling

Rate Limiting (HTTP 429)

The script automatically handles rate limits with intelligent backoff:

  • First rate limit: waits 10 seconds
  • Subsequent rate limits: adds 5 seconds each time (10s → 15s → 20s → 25s... up to 60s max)
  • Reset: back to 10 seconds after successful requests
  • No request counting: rate-limited requests don't increment the counter

Note: If you frequently receive 429 errors, contact the Webz.io team to increase your API rate limits.

HTTP Errors (401, 403, 404, 500, etc.)

  • Logs the error with status code
  • Waits 2 seconds between retries
  • Continues indefinitely until resolved

Network Errors

  • Handles connection timeouts, DNS issues, etc.
  • Logs detailed error information
  • Automatic retry with 2-second delay

Zero Posts Handling

  • When API returns 0 posts, waits 2 seconds before retry
  • Maintains the same URL until new data arrives
  • Prevents excessive API calls during quiet periods

Pagination

The script automatically follows pagination based on API response:

  1. Posts found: immediately continues to next page
  2. Zero posts returned: waits 2 seconds, then retries same URL

Why 2 seconds?

  • Consistent timing: When no posts are returned, you've caught up to real-time
  • Efficient polling: Short enough to get new data quickly, long enough to avoid excessive API calls

Stopping the Consumer

  • Press Ctrl+C to stop gracefully
  • Shows final statistics (total requests and posts processed)
  • Properly closes all connections

Example:

^C
Stopped by user after 1,247 requests
Total posts processed: 125,890

Technical Details

  • HTTP Timeout: 30 seconds per request
  • Session reuse: Maintains persistent connections for better performance
  • Memory efficient: Processes data in real-time without accumulation
  • Timestamp format: Human-readable date/time in logs
  • URL extraction: Shows actual API URLs for debugging

Requirements

  • Python 3.6+
  • requests library

Troubleshooting

"Command 'python' not found"

Use python3 instead of python:

python3 simple_consumer.py --help

"Error: API token and firehose name are required!"

Make sure you provide both required parameters:

python3 simple_consumer.py --token YOUR_TOKEN --firehose YOUR_FIREHOSE

Continuous 401 errors

  • Verify your API token is correct
  • Contact Webz.io team to confirm token is active

Continuous 429 errors

  • The script handles this automatically
  • If persists, contact Webz.io team about rate limits
  • Solution: Request higher rate limits from Webz.io team

Support

  • Get credentials: Contact the Webz.io team for your API token and firehose name
  • Issues: Check the detailed logs for specific error messages
  • Performance: The script is optimized for continuous long-running consumption

This consumer is designed for production use with robust error handling and automatic recovery.

About

Simple Python client for consuming Webz.io Firehose API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages