A comprehensive tool for parsing business data from Yandex Maps (Яндекс.Карты) and converting it to structured formats.
- Automated Business Data Extraction: Parse business information from Yandex Maps including names, ratings, reviews, addresses, phone numbers, and websites
- Multi-City Processing: Process multiple cities from a list automatically
- JSON to CSV Conversion: Convert parsed JSON data to CSV format with city names
- Data Cleaning: Built-in text cleaning and data normalization
- Duplicate Detection: Prevents duplicate entries during parsing
- Real-time Saving: Saves data incrementally to prevent data loss
pip install playwright pandas asyncio
playwright install chromiumyandex-maps-parser/
├── parser.py # Main parsing script
├── convert_cities_to_csv.py # JSON to CSV converter
└── README.md # This file
First, create a cities.txt file with city names (one per line):
Moscow
Saint Petersburg
Kazan
Novosibirsk
Run the parser:
python parser.pyThe script will:
- Open Yandex Maps in a browser
- Search for businesses in each city
- Extract detailed information for each business
- Save data to JSON files (one per city)
After parsing, convert all city JSON files to a single CSV:
python convert_cities_to_csv.pyThis will:
- Read all JSON files from the
ncity/directory - Combine data from all cities
- Add city name column
- Export to
all_cities.csv
Each city generates a JSON file with the following structure:
[
{
"name": "Company Name",
"rating": "4.5",
"reviews": "123",
"address": "Street Address, City",
"phone": "+7 XXX XXX XX XX",
"website": "company.com"
}
]The final CSV contains these columns:
city: City namename: Business namerating: Rating scorereviews: Number of reviewsaddress: Full addressphone: Phone numberwebsite: Website URL
In parser.py, you can modify:
max_pages: Maximum pages to parse per city (default: 100)- Search query: Currently set to "производство {city_name}" (manufacturing)
- Delays and timeouts for stability
In convert_cities_to_csv.py, you can change:
- Input directory:
ncity_folderparameter - Output filename:
output_fileparameter - Data cleaning rules
- Removes extra spaces and special characters
- Normalizes phone numbers
- Cleans website URLs
- Removes trailing dots from ratings
The parser handles common formatting issues:
- Removes trailing numbers:
CompanyName4→CompanyName - Removes status text:
Company Закрыто до 08:00→Company
- Rate Limiting: The parser includes delays to respect Yandex Maps rate limits
- Error Handling: Continues processing even if individual businesses fail
- Data Persistence: Saves data after each successful extraction
- Browser Automation: Uses Playwright for reliable web scraping
This tool is for educational and research purposes. Please respect Yandex Maps' terms of service and use responsibly. Consider implementing additional delays and respectful scraping practices for production use.
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is open source and available under the MIT License.
⭐ If this project helped you, please give it a star on GitHub!