This repository contains a powerful and flexible Telegram scraper designed to collect message data and media files from Telegram channels or groups. Built with Python and the Telethon library, this scraper is ideal for data analysts, researchers, and developers looking to gather and analyze data from Telegram for various purposes, such as social network analysis, market research, or content monitoring.
- Features
- Requirements
- Installation
- Usage
- Configuration
- How It Works
- Error Handling
- Contributing
- License
- Asynchronous Scraping: Utilizes Python's
asyncioand Telethon library for efficient and non-blocking data collection. - Data and Media Download: Fetches messages, along with any attached media (images, videos, etc.), and saves them locally.
- Error Management: Includes mechanisms to handle rate limits (
FloodWaitError) and other common exceptions gracefully. - Customizable: Allows users to define scraping parameters such as group/channel name, message limits, and scraping intervals.
- Data Consolidation: Merges individual CSV files containing messages and metadata into a single, comprehensive dataset.
- Python 3.7 or later
- Telethon
nest_asynciofor enabling nested event loops
-
Clone the Repository
git clone https://github.com/Amirwpi/Telegram_Scraper.git cd telegram-scraper -
Install Dependencies
Install the required Python libraries using pip:
pip install -r requirements.txt
Note: Ensure you have
Telethonandnest_asyncioin yourrequirements.txt.
- Obtain your API ID and hash from the Telegram API.
- Update the placeholders in the script (
api_id,api_hash, andphone_number) with your actual credentials.
Execute the following command to authenticate your Telegram session:
python 01-Telegram Session Generator.pyFollow the on-screen instructions to complete authentication and retrieve your session string.
After obtaining the session string, run the main scraper to start collecting data:
python 02-Telegram Scraper.pyOnce the scraping is complete, use the merging script to consolidate all CSV files into a single file:
python 03-merge_output_files.py##Hasn't uploaded but I will add this part as well
Before running the scraper, customize the following settings in main_scraper.py:
api_idandapi_hash: Your Telegram API credentials.Session: The Telegram session that we got from "01-Telegram Session Generator.py"group_title: The Telegram group or channel from which to scrape data.limit_msg: Maximum number of messages to fetch per request.Repeat_number: Number of iterations for repeated scraping.datetime_before: The initial timestamp to begin scraping messages from.
Important Note: Adjust the Repeat_number, datetime_before, and other parameters based on your specific data and channel requirements. It's essential to monitor the scraping process to ensure that it is functioning correctly. If an error occurs, use the last file's timestamp to continue scraping from where it stopped.
The first part of the scraper handles authentication with the Telegram API. It checks for authorization, manages two-factor authentication, and provides a session string for subsequent scraping tasks.
The main scraping script uses the GetHistoryRequest function to iteratively fetch messages and media from the specified group or channel. It handles Telegram's rate limits by catching FloodWaitError exceptions and waiting before retrying requests.
After data collection, the scraper saves the messages and media information in multiple CSV files. The merging script consolidates these files into a single dataset for easier analysis.
- FloodWaitError: Automatically waits for the required time if Telegram's rate limit is hit.
- SessionPasswordNeededError: Handles two-factor authentication if enabled on the account.
- General exceptions are caught and logged to ensure the scraper continues running smoothly.
Contributions are welcome! Please feel free to submit a pull request or open an issue to suggest improvements or report bugs.
- Fork the repository.
- Create a new branch (
git checkout -b feature/your-feature-name). - Commit your changes (
git commit -am 'Add your feature'). - Push to the branch (
git push origin feature/your-feature-name). - Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details. This code and documentation were written by Amir Jamali.