Skip to content

MeowmelMuku/WeatherScraperForecaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClimateCrawlPredict A data mining and machine learning project that automatically collects weather data from China Weather Network (www.weather.com.cn) and applies the K-Means clustering algorithm to analyze and identify patterns or types of weather conditions.

📖 Overview This project is a end-to-end pipeline for weather data acquisition and analysis. It consists of two main components:

Scrapy Spider: A robust web crawler built with Scrapy to efficiently extract historical and forecasted weather data from China Weather Network.

K-Means Clustering: A machine learning module that processes the scraped data, performs preprocessing, and uses the K-Means algorithm to cluster the weather data points into distinct groups. This helps in identifying common weather patterns (e.g., hot & dry, cold & humid, mild & rainy) without prior labeling.

✨ Features Modular Scrapy Spider: Configurable to scrape data for specific cities and date ranges.

Structured Data Storage: Outputs cleaned data into CSV or JSON formats for easy analysis.

Data Preprocessing: Handles missing values, normalizes features, and prepares data for machine learning.

Unsupervised Learning: Utilizes K-Means to find inherent groupings in weather data.

Visualization: Includes scripts to generate plots (e.g., Elbow Method, PCA scatter plots) to visualize clusters and results.

🛠️ Tech Stack Web Scraping: Scrapy

Data Processing & Analysis: Pandas, NumPy

Machine Learning: Scikit-learn

Data Visualization: Matplotlib, Seaborn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published