Skip to content

looksg00d/yandex-maps-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Yandex Maps Parser

A comprehensive tool for parsing business data from Yandex Maps (Яндекс.Карты) and converting it to structured formats.

🚀 Features

  • Automated Business Data Extraction: Parse business information from Yandex Maps including names, ratings, reviews, addresses, phone numbers, and websites
  • Multi-City Processing: Process multiple cities from a list automatically
  • JSON to CSV Conversion: Convert parsed JSON data to CSV format with city names
  • Data Cleaning: Built-in text cleaning and data normalization
  • Duplicate Detection: Prevents duplicate entries during parsing
  • Real-time Saving: Saves data incrementally to prevent data loss

📋 Requirements

pip install playwright pandas asyncio
playwright install chromium

🗂️ Project Structure

yandex-maps-parser/
├── parser.py                 # Main parsing script
├── convert_cities_to_csv.py  # JSON to CSV converter
└──  README.md                 # This file

🔧 Usage

1. Parsing Yandex Maps Data

First, create a cities.txt file with city names (one per line):

Moscow  
Saint Petersburg
Kazan
Novosibirsk

Run the parser:

python parser.py

The script will:

  • Open Yandex Maps in a browser
  • Search for businesses in each city
  • Extract detailed information for each business
  • Save data to JSON files (one per city)

2. Converting JSON to CSV

After parsing, convert all city JSON files to a single CSV:

python convert_cities_to_csv.py

This will:

  • Read all JSON files from the ncity/ directory
  • Combine data from all cities
  • Add city name column
  • Export to all_cities.csv

📊 Output Format

JSON Output

Each city generates a JSON file with the following structure:

[
  {
    "name": "Company Name",
    "rating": "4.5",
    "reviews": "123",
    "address": "Street Address, City",
    "phone": "+7 XXX XXX XX XX",
    "website": "company.com"
  }
]

CSV Output

The final CSV contains these columns:

  • city: City name
  • name: Business name
  • rating: Rating score
  • reviews: Number of reviews
  • address: Full address
  • phone: Phone number
  • website: Website URL

⚙️ Configuration

Parser Settings

In parser.py, you can modify:

  • max_pages: Maximum pages to parse per city (default: 100)
  • Search query: Currently set to "производство {city_name}" (manufacturing)
  • Delays and timeouts for stability

CSV Converter Settings

In convert_cities_to_csv.py, you can change:

  • Input directory: ncity_folder parameter
  • Output filename: output_file parameter
  • Data cleaning rules

🛠️ Data Cleaning Features

Automatic Text Cleaning

  • Removes extra spaces and special characters
  • Normalizes phone numbers
  • Cleans website URLs
  • Removes trailing dots from ratings

Business Name Cleaning

The parser handles common formatting issues:

  • Removes trailing numbers: CompanyName4CompanyName
  • Removes status text: Company Закрыто до 08:00Company

📝 Notes

  • Rate Limiting: The parser includes delays to respect Yandex Maps rate limits
  • Error Handling: Continues processing even if individual businesses fail
  • Data Persistence: Saves data after each successful extraction
  • Browser Automation: Uses Playwright for reliable web scraping

⚠️ Disclaimer

This tool is for educational and research purposes. Please respect Yandex Maps' terms of service and use responsibly. Consider implementing additional delays and respectful scraping practices for production use.

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

📄 License

This project is open source and available under the MIT License.

🔗 Related Tools


⭐ If this project helped you, please give it a star on GitHub!

About

A comprehensive tool for parsing business data from Yandex Maps and converting it to structured formats

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages