Skip to content

Commit cd036d4

Browse files
committed
add sites names
1 parent 3baba88 commit cd036d4

File tree

5 files changed

+55
-3
lines changed

5 files changed

+55
-3
lines changed

sites-data-fetch/0.csv

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,11 @@
11
url,title,name,ckan_version,description,detailed_description,api_title,contact_email,extensions,primary_language,num_datasets,num_groups,num_organizations,site_type,type_confidence,
2-
https://opendata.gov.nt.ca/
2+
https://opendata.gov.nt.ca/
3+
https://data.gov.sg/
4+
https://data.gov/
5+
https://hri.fi/en_gb/
6+
https://www.energidataservice.dk/
7+
https://www.neso.energy/
8+
https://data.dathere.com/
9+
https://txwaterdatahub.org/
10+
https://catalog.newmexicowaterdata.org/
11+
https://boernetx.opendataportal.us/

sites-data-fetch/3-siteType.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -451,7 +451,7 @@ def analyze_single_url(self, url: str) -> None:
451451
def main():
452452
"""Main function to run the site type detector."""
453453
# Configuration
454-
INPUT_FILE = '1.csv'
454+
INPUT_FILE = '2.csv'
455455
OUTPUT_FILE = '3.csv'
456456
URL_COLUMN = 'url'
457457
TYPE_COLUMN = 'site_type'

sites-data-fetch/5-locationAnalyser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -776,7 +776,7 @@ def main():
776776
print()
777777

778778
# Get API key from environment variable or hardcoded value
779-
api_key = os.getenv("OPENROUTER_API_KEY")
779+
api_key = os.getenv("OPEN_ROUTER_KEY")
780780

781781
# Check if input file exists
782782
if not os.path.exists(INPUT_FILE):

sites-data-fetch/ckan_location_analyzer.log

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,22 @@
88
2025-07-02 18:37:30,116 - MainThread - INFO - Region distribution:
99
2025-07-02 18:37:30,116 - MainThread - INFO - North America: 1 (100.0%)
1010
2025-07-02 18:37:30,116 - MainThread - INFO - Done!
11+
2025-08-29 16:32:50,118 - MainThread - INFO - Reading CSV from 4.csv
12+
2025-08-29 16:32:50,119 - MainThread - INFO - Processing all 1 rows
13+
2025-08-29 16:32:50,129 - ThreadPoolExecutor-0_0 - INFO - Thread 6202830848: Processing site: opendata-gov-nt-ca (https://opendata.gov.nt.ca/)
14+
2025-08-29 16:32:50,273 - ThreadPoolExecutor-0_0 - INFO - Thread 6202830848: Calling API with model: google/gemini-2.0-flash-001, max_tokens: 2000
15+
2025-08-29 16:32:50,444 - ThreadPoolExecutor-0_0 - ERROR - Giving up call_openrouter_api(...) after 1 tries (requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://openrouter.ai/api/v1/chat/completions)
16+
2025-08-29 16:32:50,444 - ThreadPoolExecutor-0_0 - WARNING - Attempt 1/3 failed: 401 Client Error: Unauthorized for url: https://openrouter.ai/api/v1/chat/completions
17+
2025-08-29 16:32:51,453 - ThreadPoolExecutor-0_0 - INFO - Thread 6202830848: Calling API with model: google/gemini-2.0-flash-001, max_tokens: 2000
18+
2025-08-29 16:32:51,630 - ThreadPoolExecutor-0_0 - ERROR - Giving up call_openrouter_api(...) after 1 tries (requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://openrouter.ai/api/v1/chat/completions)
19+
2025-08-29 16:32:51,630 - ThreadPoolExecutor-0_0 - WARNING - Attempt 2/3 failed: 401 Client Error: Unauthorized for url: https://openrouter.ai/api/v1/chat/completions
20+
2025-08-29 16:32:53,639 - ThreadPoolExecutor-0_0 - INFO - Thread 6202830848: Calling API with model: google/gemini-2.0-flash-001, max_tokens: 2000
21+
2025-08-29 16:32:53,823 - ThreadPoolExecutor-0_0 - ERROR - Giving up call_openrouter_api(...) after 1 tries (requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://openrouter.ai/api/v1/chat/completions)
22+
2025-08-29 16:32:53,823 - ThreadPoolExecutor-0_0 - WARNING - Attempt 3/3 failed: 401 Client Error: Unauthorized for url: https://openrouter.ai/api/v1/chat/completions
23+
2025-08-29 16:32:53,823 - ThreadPoolExecutor-0_0 - ERROR - All 3 attempts failed
24+
2025-08-29 16:32:53,933 - MainThread - INFO - Saving final results to 5.csv
25+
2025-08-29 16:32:53,940 - MainThread - INFO - Processing complete. Successfully processed: 0, Failed: 1
26+
2025-08-29 16:32:53,941 - MainThread - INFO - Countries identified: 0 (0.0%)
27+
2025-08-29 16:32:53,941 - MainThread - INFO - Region distribution:
28+
2025-08-29 16:32:53,941 - MainThread - INFO - Global / Uncertain: 1 (100.0%)
29+
2025-08-29 16:32:53,941 - MainThread - INFO - Done!

sites-data-fetch/requirements.txt

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Core HTTP and web scraping
2+
requests>=2.28.0
3+
beautifulsoup4>=4.11.0
4+
5+
# Translation and language detection
6+
googletrans==4.0.0rc1
7+
langdetect>=1.0.9
8+
langcodes>=3.3.0
9+
10+
# Text processing and utilities
11+
python-slugify>=8.0.0
12+
13+
# Data processing
14+
pandas>=1.5.0
15+
16+
# Progress bars and retries
17+
tqdm>=4.64.0
18+
backoff>=2.2.0
19+
20+
# Geographic data
21+
pycountry>=22.3.0
22+
23+
# Environment variables
24+
python-dotenv>=1.0.0

0 commit comments

Comments
 (0)