Skip to content

Commit fe9257f

Browse files
author
tim
committed
Rewrite Niedersachsen spider with shapefile-first approach
Replaces API-only scraping with LSN geodata integration providing 100% geolocation coverage (4,250 schools). Implements robust matching, normalization, and async downloads. Key improvements: - Add shapefile integration for 4 LSN categories (ABS, Förder, BBS, SdG) - Implement Unicode-aware (NFKD) German character normalization - Add collision-safe API indexing with form-aware disambiguation - Use async Scrapy Requests for shapefile downloads (non-blocking) - Generate stable synthetic IDs (SHA-1) for shapefile-only schools - Add path traversal protection in ZIP extraction - Ensure API-only schools are fetched after shapefile processing - Add enhanced school form normalization with long-form variants - Add allowed_domains and stats tracking Coverage: 4,250 schools with geodata (~75% API-enriched, ~25% shapefile-only) Adds pyproj dependency for CRS transformation support. Note: API endpoint (schulen.nibis.de) is public but may experience intermittent connectivity. Shapefile processing is fully functional and provides complete geodata coverage.
1 parent e30ae74 commit fe9257f

File tree

4 files changed

+825
-354
lines changed

4 files changed

+825
-354
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,9 @@ When available, we try to use the geolocations provided by the data publishers.
4949
| BB | ✅ Yes | WFS |
5050
| HB | ❌ No | - |
5151
| HH | ✅ Yes | WFS |
52-
| HE | ❌ No | - |
52+
| HE | ⚠️ Partial (90.7%) | Extracted from OSM on detail pages (1,863/2,054 schools). The 191 schools without coordinates include both schools with placeholder coordinates (-1.0, -1.0) that are filtered to null and schools with no map data at all. |
5353
| MV | ✅ Yes | WFS |
54-
| NI | ❌ No | - |
54+
| NI | ✅ Yes (4,250 schools) | Shapefile-first approach: all 4,250 LSN schools have geodata. ~75% enriched with API data (phone, email, etc.), ~25% basic shapefile data only. Requires manual shapefile setup in cache/niedersachsen_shapefiles/. |
5555
| NW | ✅ Yes | Converted from EPSG:25832 in source CSV data |
5656
| RP | ❌ No | - |
5757
| SL | ✅ Yes | WFS |

0 commit comments

Comments
 (0)