Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# Change log

### 0.3.0
- Update converters to emit OSW 0.3 schema id and support new vegetation features (trees, tree rows, woods).
- Extend OSW normalizers to keep `leaf_cycle` and `leaf_type` where allowed for points, lines, and polygons.
- Add unit coverage for OSW 0.3 natural feature handling.
- Expand OSM normalizer coverage and robustness: preserve non-compliant/unknown tags as `ext:*`, canonicalize JSON ext values, normalize elevation from 3D geometries, tolerate string IDs, and harden edge-case handling with tests.
- OSW→OSM improvements: promote invalid/unknown fields (incl. dict/list) to `ext:*`, set `version="1"` for visible elements, derive `ext:elevation` from Z coords, and keep invalid incline/climb values under `ext:` instead of dropping them.
- OSM→OSW improvements: verify OSW 0.3 `$schema` headers, export tree/tree_row/wood features, treat `ext:` tags as valid identifiers in OSW normalizers for filtering, and add multi-exterior handling tests for zones/polygons plus line parsing guards.
- Added extensive unit tests for osm/osw normalizers and graph serializers (filters, geojson import/export, zebra crossing mapping, kerb/foot validators, invalid line/polygon/zone branches, ref normalization, etc.).
- Added fixtures for vegetation and 3D elevation scenarios (`tree-test.xml`) and custom-property round-trip checks.
- Implemented collision-free ID handling: sequential remapping of nodes/ways/relations on OSW→OSM export with reference rewrites, plus tests confirming sequential IDs and schema/tag updates.

### 0.2.13
- Added default `version="1"` attribute to all nodes, ways, and relations generated during OSW→OSM conversion.
- Introduced unit test coverage to verify version attributes are written for all OSM elements.

### 0.2.12
- Updated OSMTaggedNodeParser to apply the OSW node and point filters with normalization before adding loose tagged nodes, ensuring non-compliant features like crossings are no longer emitted.
- Extended serializer tests to cover the new tagged-node behavior, confirming that compliant kerb features are retained while schema-invalid crossings are skipped.
Expand Down Expand Up @@ -92,4 +107,4 @@
- Added unit test cases
- Added README.md file
- Added CHANGELOG.md file
- Added test.pypi pipeline
- Added test.pypi pipeline
35 changes: 35 additions & 0 deletions docs/TESTING_OVERVIEW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Below is a breakdown of all test scenarios (`def test_` and `async def test_`) across the suite and the main areas they cover.

Total scenarios: 211
Code coverage: 98% (`coverage report`)

Test module | Scenarios | Focus / scenarios covered (key checks)
--- | --- | ---
tests/unit_tests/helpers/test_osm.py | 11 | Way/node/point/line/zone/polygon counters; entity counter dispatch; OSMGraph creation and simplification/geometry construction lifecycles; filter helpers return booleans for tagged inputs
tests/unit_tests/helpers/test_osw.py | 22 | OSW filters (all geometries); unzip/merge optional files & missing-file handling; simplify/construct graph end-to-end; per-entity counters; ext tag retention; temp-file cleanup; optional files presence/absence; merge deletes intermediates
tests/unit_tests/helpers/test_response.py | 6 | Response dataclass defaults, attribute access, repr/str formatting, mutation for lists/strings/error payloads
tests/unit_tests/test_formatter.py | 7 | Formatter success/error paths; converter delegation; cleanup lifecycle; exception surfacing; mocking converter calls; workdir creation/idempotence; cleanup of existing vs missing files
tests/unit_tests/test_osm2osw/test_osm2osw.py | 15 | OSM→OSW: file counts/types; width/incline validation/cleaning; `$schema`=0.3 headers; ext-tag passthrough; point geometry enforcement; duplicate vs unique IDs; schema header verification; bad-input path; type assertions; tree/tree_row/wood export coverage; invalid-node-tag skip logic
tests/unit_tests/test_osm_compliance/test_osm_compliance.py | 2 | OSW validation of OSW→OSM→OSW roundtrip; incline tag preservation using official validator
tests/unit_tests/test_osw2osm/test_osw2osm.py | 16 | OSW→OSM: version attrs (visible/non-visible); incline/climb handling; invalid incline to `ext:`; custom dict/list props to `ext:`; 3D elevation to `ext:elevation`; missing-zip error; invalid width to `ext:`; XML output type assertions; ext tags on invalid properties; climb suppression when incline present; sequential ID remap and ref rewrite assertions
tests/unit_tests/test_roundtrip/test_roundtrip.py | 2 | Full roundtrip (OSW zip → OSM XML → OSW → OSM) smoke checks, ID preservation, schema continuity, ext:* tag parity for both OSW-zip and raw-OSM starting points
tests/unit_tests/test_serializer/test_osm_graph.py | 37 | Parsers (ways/nodes/points/lines/zones/polygons) incl. invalid locations & multi-exteriors; tagged node parsing; simplify/construct geometries; to_geojson ID/export rules; to_undirected variants; filter_edges node-copy; from_geojson import/export; point ID prefix trimming; progress callbacks; invalid location skip logic; duplicate-id protection; empty-graph exports
tests/unit_tests/test_serializer/test_osm_normalizer.py | 19 | OSM normalizer tag filtering: datatype coercion/NaN removal; incline/climb/foot handling; ext tag retention; zebra crossing mapping; kerb/foot validators; width/incline edge cases; implied foot removal; `_id` sourcing when tags absent/empty
tests/unit_tests/test_serializer/test_osm_osm_normalizer.py | 11 | OSM normalizer edge cases: `_stash_ext` JSON canonicalization/errors/unknown keys; dict/list promotion to `ext:`; zone area tagging; elevation extraction fallbacks; ID normalization across nodes/ways/relations and refs/nodeRefs/refs; ref write-back branches; non-numeric ID tolerance; canonical `ext:` serialization
tests/unit_tests/test_serializer/test_osw_normalizer.py | 63 | OSW normalizer: filters/normalizers for all feature types; tree/tree_row/wood support; leaf_cycle/leaf_type validation; crossing markings (incl. zebra inference); kerb/foot/surface validators; invalid branches raising; keep_key/default behaviors; width/incline/climb handling; ext tag passthrough and ext-based filter classification; literal keep-key handling; natural-* guards; tactile paving/surface normalization

Detailed scenario highlights (what we explicitly exercise)
- tests/unit_tests/helpers/test_osm.py: async counters on `wa.microsoft.osm.pbf` confirm expected counts for ways/points/nodes; `get_osm_graph` builds an `OSMGraph` then `simplify_og`/`construct_geometries` run without mutating return types; way/node/point/zone/polygon filters accept tagged inputs and return booleans.
- tests/unit_tests/helpers/test_osw.py: counts for ways/nodes/points/zones/lines/polygons across the same PBF; unzip returns the expected nodes/edges/points artifacts and gracefully returns empty dict when files are missing; merge combines multiple GeoJSON FeatureCollections and deletes temp inputs; zone/polygon filters assert boolean output; temp cleanup covers both existing and already-removed files.
- tests/unit_tests/helpers/test_response.py: default `Response` has `status=True` with `None` files/error; supports list or string `generated_files`; preserves custom error messages and `None` errors in success cases.
- tests/unit_tests/test_formatter.py: `Formatter.osm2osw` happy/failed paths surface `Response.status`; workdir is created idempotently whether or not it exists; cleanup removes tracked files and ignores missing ones; `Formatter.osw2osm` delegates to `OSW2OSM.convert` exactly once (mocked) and propagates its response.
- tests/unit_tests/test_osm2osw/test_osm2osw.py: end-to-end conversion yields six outputs (nodes/points/edges/zones/polygons/lines) with string paths; GeoJSONs contain non-empty geometries with string `_id`s and no duplicates; width tags are numeric, incline tags remain numeric on edges, and invalid node tags lead to no files; `$schema` header equals 0.3 and carries through tree/tree_row/wood fixtures; ext:* properties are preserved; file naming matches expected entity types; failure path returns `status=False`.
- tests/unit_tests/test_osm_compliance/test_osm_compliance.py: runs OSW→OSM→OSW through `python_osw_validation` to assert zero validation issues; checks that incline tags survive the full round-trip.
- tests/unit_tests/test_osw2osm/test_osw2osm.py: converts OSW ZIPs to a single OSM XML, ensuring width tags are present and numeric; error path when ZIP is missing; incline tags are present but climb tags are stripped or shifted to `ext:incline` for invalid values; custom/non-compliant properties (dict/list) are promoted to ext:* JSON; 3D node coordinates emit `ext:elevation`; `_ensure_version_attribute` backfills version on visible elements; sequential ID remap rewrites ids/refs; all generated paths are strings and end with `.xml`.
- tests/unit_tests/test_roundtrip/test_roundtrip.py: two smoke flows keep ext:* tags intact—(1) OSW ZIP → OSM XML → OSW → OSM, (2) raw OSM XML → OSW → OSM—comparing node/way ext:* sets for equality.
- tests/unit_tests/test_serializer/test_osm_graph.py: graph metadata (directed/multigraph) and undirected copies retain node attrs; parsers handle missing nodes/invalid coordinates and multi-exterior polygons/zones; tagged-node parser only ingests OSW nodes; simplify/construct geometries rebuild missing geometries for points/lines with node refs; `to_geojson` preserves IDs, trims point prefixes, handles empty graphs, exports progress callbacks; `from_geojson` ingests features and respects mapping hooks and filter functions.
- tests/unit_tests/test_serializer/test_osm_normalizer.py: width/incline/climb coercion removes NaN/invalid strings, retains valid ints/floats; climb removal rules when incline present, except steps keep climb/down; ext_osm_id assignment prefers tags but falls back to internal IDs and skips empty values; implied foot tags dropped where inappropriate.
- tests/unit_tests/test_serializer/test_osm_osm_normalizer.py: `_stash_ext` normalizes JSON strings, skips None, and serializes unknown structures; filter_tags moves unknown keys or invalid datatypes to ext:* and adds area tags for zones; elevation extraction rejects NaN, falling back through z/ele tags; ID normalization covers negative IDs and writes back refs/nodeRefs/refs variants.
- tests/unit_tests/test_serializer/test_osw_normalizer.py: validators classify sidewalks/crossings/traffic islands/stairs/living streets/powerpoles/trees/tree_rows/wood; invalid geometries raise where expected; stair normalization keeps/drops climb per validity and defaults highway/foot; width/incline/climb handling mirrors OSM normalizer; crossing markings inferred from zebra tags; keep_key/default behaviors honored; tactile paving/surface/leaf_cycle/leaf_type/kerb/foot rules validated; natural-* checks drop invalid feature types.

Method: counted functions matching `def test_` and `async def test_` under `tests/` and grouped scenario themes per file.
72 changes: 72 additions & 0 deletions docs/id_remapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# ID Remapping (OSW → OSM)

This document explains how IDs are generated and remapped when converting OSW (GeoJSON) to OSM XML.

## Goal
Produce collision-free OSM XML where all node/way/relation IDs are sequential per type (starting at 1) and all references are updated accordingly, while preserving OSW identifiers in `_id` tags.

## Process
1. **Initial IDs from OSW content**
- Nodes/points/lines/zones/polygons parsed from OSW GeoJSON enter the OSM graph with their OSW `_id`/references.
- Extension/unknown properties are preserved under `ext:*`; elevation from 3D coordinates becomes `ext:elevation`.

2. **OSW→OSM export**
- `OSW2OSM.convert()` runs the normal ogr2osm pipeline, writing an OSM XML file.
- `_ensure_version_attribute` ensures all elements have `version="1"` (visible elements get it if missing).

3. **Sequential remap**
- After the XML is written, `_remap_ids_to_sequential` rewrites IDs and references:
- Nodes are renumbered `1..N` in document order; their `_id` tags are updated to the new ID.
- Ways are renumbered `1..M`; their `_id` tags are updated. All `<nd ref>` values are rewritten to the new node IDs.
- Relations are renumbered `1..K`; their `_id` tags are updated. All `<member ref>` values are rewritten based on member `type` (node/way/relation) using the new ID maps.
- The remap runs in-place on the XML so the final output has consistent, collision-free IDs and references.

4. **What remains**
- Original OSW identifiers survive in other tags (e.g., `ext:osm_id` if provided, other `ext:*`), but `_id` reflects the new sequential OSM ID.

## Notes / rationale
- The remap ensures deterministic, collision-free IDs regardless of source naming schemes (e.g., OSW prefixes, extension data).
- Reference integrity is maintained by rewriting all node refs in ways and member refs in relations.
- Version attributes are normalized before remapping to satisfy OSM validators expecting `version`.

## Minimal example (what the remap does)
Input XML (simplified):
```xml
<osm>
<node id="10" lat="0" lon="0"><tag k="_id" v="10"/></node>
<node id="20" lat="1" lon="1"><tag k="_id" v="20"/></node>
<way id="30">
<nd ref="10"/><nd ref="20"/>
<tag k="_id" v="30"/>
</way>
<relation id="40">
<member type="node" ref="20"/>
<member type="way" ref="30"/>
<tag k="_id" v="40"/>
</relation>
</osm>
```

After `_remap_ids_to_sequential`:
```xml
<osm>
<node id="1" ...><tag k="_id" v="1"/></node>
<node id="2" ...><tag k="_id" v="2"/></node>
<way id="1">
<nd ref="1"/><nd ref="2"/>
<tag k="_id" v="1"/>
</way>
<relation id="1">
<member type="node" ref="2"/>
<member type="way" ref="1"/>
<tag k="_id" v="1"/>
</relation>
</osm>
```
All IDs now start at 1 per type, and every reference points to the remapped IDs.

## Relevant code
- Entry point: `OSW2OSM.convert()` (`src/osm_osw_reformatter/osw2osm/osw2osm.py`)
- Calls `_ensure_version_attribute`
- Calls `_remap_ids_to_sequential`
- Remap implementation: `_remap_ids_to_sequential` in `osw2osm.py` rewrites element IDs and their refs in-place and updates `_id` tags.
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ shapely~=2.0.2
pyproj~=3.6.1
coverage~=7.5.1
ogr2osm==1.2.0
python-osw-validation==0.2.15
python-osw-validation==0.3.1
67 changes: 67 additions & 0 deletions src/osm_osw_reformatter/osw2osm/osw2osm.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import gc
import ogr2osm
from xml.etree import ElementTree as ET
from pathlib import Path
from ..helpers.osw import OSWHelper
from ..helpers.response import Response
Expand Down Expand Up @@ -32,6 +33,8 @@ def convert(self) -> Response:
# Instantiate either ogr2osm.OsmDataWriter or ogr2osm.PbfDataWriter
data_writer = ogr2osm.OsmDataWriter(output_file, suppress_empty_tags=True)
osm_data.output(data_writer)
self._ensure_version_attribute(output_file)
self._remap_ids_to_sequential(output_file)

del translation_object
del datasource
Expand All @@ -46,3 +49,67 @@ def convert(self) -> Response:
finally:
gc.collect()
return resp

@staticmethod
def _ensure_version_attribute(osm_xml_path: Path) -> None:
"""Ensure nodes, ways, and relations include a version attribute."""
try:
tree = ET.parse(osm_xml_path)
except Exception:
return

root = tree.getroot()
for tag in ('node', 'way', 'relation'):
for element in root.findall(f'.//{tag}'):
if not element.get('version'):
element.set('version', '1')

tree.write(osm_xml_path, encoding='utf-8', xml_declaration=True)

@staticmethod
def _remap_ids_to_sequential(osm_xml_path: Path) -> None:
"""Remap node/way/relation IDs to sequential values starting at 1 and update references."""
try:
tree = ET.parse(osm_xml_path)
except Exception:
return

root = tree.getroot()

def remap_elements(xpath: str):
mapping = {}
elems = root.findall(xpath)
for idx, elem in enumerate(elems, start=1):
old_id = elem.get('id')
if old_id is None:
continue
mapping[old_id] = str(idx)
elem.set('id', str(idx))
for tag in elem.findall("./tag[@k='_id']"):
tag.set('v', str(idx))
return mapping

node_map = remap_elements('.//node')
way_map = remap_elements('.//way')
rel_map = remap_elements('.//relation')

# Update way nd refs
for way in root.findall('.//way'):
for nd in way.findall('nd'):
ref = nd.get('ref')
if ref in node_map:
nd.set('ref', node_map[ref])

# Update relation member refs
for rel in root.findall('.//relation'):
for member in rel.findall('member'):
ref = member.get('ref')
m_type = member.get('type')
if m_type == 'node' and ref in node_map:
member.set('ref', node_map[ref])
elif m_type == 'way' and ref in way_map:
member.set('ref', way_map[ref])
elif m_type == 'relation' and ref in rel_map:
member.set('ref', rel_map[ref])

tree.write(osm_xml_path, encoding='utf-8', xml_declaration=True)
6 changes: 6 additions & 0 deletions src/osm_osw_reformatter/serializer/osm/osm_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -730,3 +730,9 @@ def from_geojson(cls, nodes_path, edges_path):

for edge_feature in edges_fc['features']:
props = edge_feature['properties']
u = props.pop('_u_id')
v = props.pop('_v_id')
props['geometry'] = shape(edge_feature['geometry'])
G.add_edges_from([(u, v, props)])

return osm_graph
Loading
Loading