Commit 0feff03
[ST]: Replace HTML scraper with ArcGIS API scraper (#219)
* Replace Sachsen-Anhalt HTML scraper with ArcGIS API scraper
Switch from HTML scraping (bildung-lsa.de) to ArcGIS FeatureServer API.
This provides cleaner data access and adds geolocation support.
Changes:
- Replace HTML parsing with ArcGIS REST API JSON parsing
- Add coordinate transformation (EPSG:25832 -> WGS84) using pyproj
- Add geolocation coverage: 100% (857 schools)
- Update ID scheme: ST-1001186 -> ST-ARC00001 (OBJECTID-based)
- Update README: Mark ST as having geolocation via ArcGIS
- Note: OBJECTID stability is uncertain (may change on reimport)
Data source: services-eu1.arcgis.com ArcGIS FeatureServer
Coverage: 857 schools (excludes vocational schools)
* Remove personal dev files from .gitignore (now in global gitignore)
* Use None instead of empty strings for missing data
Replace empty string defaults with None when extracting school attributes.
This provides clearer semantics for missing data and follows database
best practices (NULL vs empty string).
Note: Current ArcGIS dataset has no missing values, but this change
future-proofs the code and follows Python/SQL conventions.
* Remove exception handling from coordinate transformation
Let coordinate transformation fail loudly if it encounters issues
rather than silently logging a warning and continuing with null coordinates.
This follows the 'fail fast' principle - if transformation fails, we want
to know immediately so we can fix the root cause rather than silently
producing incomplete data.
---------
Co-authored-by: tim <[email protected]>1 parent 230ee34 commit 0feff03
2 files changed
+46
-45
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | 2 | | |
4 | 3 | | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
11 | 15 | | |
12 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
13 | 19 | | |
14 | 20 | | |
15 | | - | |
16 | | - | |
17 | 21 | | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
| 22 | + | |
| 23 | + | |
29 | 24 | | |
30 | | - | |
31 | | - | |
| 25 | + | |
| 26 | + | |
32 | 27 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
41 | 31 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
51 | 49 | | |
52 | 50 | | |
53 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
54 | 56 | | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
64 | 65 | | |
0 commit comments