Skip to content

Commit 4e64912

Browse files
committed
Fix multistage processing
The multi-stage processing in the flex output has some problems when updating data, for instance when relations are deleted after their member ways use their tags. This commit reorganizes how the multi-stage processing is done to address these problems. 1. The way "marking" of objects is done is now different. Instead of doing this in the process_* Lua functions a new Lua function "select_relation_members()" is introduced. This will be called not only for all new relations but also for deleted relations, or when a relation changed, for the old relation. This is needed so that we also mark and then re-create way entries in the database that used to depend on a parent relations tags, but don't do that any more. This function must return the way ids that need to be re-processed in stage 2. 2. We remove the Lua mark() function. Instead the return value of "select_relation_members()" is used to mark way ids. Marking of nodes and relations isn't supported. It was never possible to mark nodes anyway. And there is no use case I am aware of currently that needs marking relations. Both can be reintroduced later when we have a better idea how to handle them. For the time being we concentrate on the important use case where member ways of relations are handled specially. 3. This introduces a new processing stage 1c: * stage 1a: Read input file and process all objects in it. * stage 1b: Process dependent objects of objects from 1a (ie changed nodes trigger changes in ways with those nodes, changes in all objects potentially trigger changes in parent relations). * stage 1c: Process dependent relations of objects marked during stage 1a/1b (this one is new). * stage 2: Reprocess objects marked in stage 1a or 1b. 4. New Lua helper function osm2pgsql.way_member_ids() that returns the ids of all way members of the specified relation. This is often needed in "select_relation_members()". If you need more complex processing, for instance to use the member roles to decide which ways need stage 2 processing, you can still write your own loop. This commit reduces the number of processes used in the TestPgsqlUpdateParallel regression test from 16 to 15. This is needed so we don't hit the default limit of 100 for the number of database connections. This is probably something we should look into, but not here and now.
1 parent 858bebd commit 4e64912

File tree

13 files changed

+735
-145
lines changed

13 files changed

+735
-145
lines changed

docs/flex.md

Lines changed: 68 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,14 @@ The following functions are defined:
3636
* `osm2pgsql.define_table(options)`: Define a table. This is the more flexible
3737
function behind all the other `define_*_table()` functions. It gives you
3838
more control than the more convenient other functions.
39-
* `osm2pgsql.mark_way(id)`: Mark the OSM way with the specified id. This way
40-
will be processed (again) in stage 2.
4139

4240
You are expected to define one or more of the following functions:
4341

44-
* `osm2pgsql.process_node()`: Called for each node.
45-
* `osm2pgsql.process_way()`: Called for each way.
46-
* `osm2pgsql.process_relation()`: Called for each relation.
42+
* `osm2pgsql.process_node()`: Called for each new or changed node.
43+
* `osm2pgsql.process_way()`: Called for each new or changed way.
44+
* `osm2pgsql.process_relation()`: Called for each new or changed relation.
45+
* `osm2pgsql.select_relation_members()`: Called for each deleted or added
46+
relation. See below for more details.
4747

4848
Osm2pgsql also provides some additional functions in the
4949
[lua-lib.md](Lua helper library).
@@ -76,7 +76,7 @@ stored as is, relation ids will be stored as negative numbers.
7676
With the `osm2pgsql.define_table()` function you can also define tables that
7777
* don't have any ids, but those tables will never be updated by osm2pgsql
7878
* take *any OSM object*, in this case the type of object is stored in an
79-
additional column.
79+
additional `char(1)` column.
8080
* are in a specific PostgresSQL tablespace (set option `data_tablespace`) or
8181
that get their indexes created in a specific tablespace (set option
8282
`index_tablespace`).
@@ -242,25 +242,72 @@ a default transformation. These are the defaults:
242242

243243
## Stages
244244

245-
Osm2pgsql processes the data in up to two stages. You can mark ways in stage 1
246-
for processing in stage 2 by calling `osm2pgsql.mark_way(id)`. If you don't
247-
mark any ways, nothing will be done in stage 2.
245+
When processing OSM data, osm2pgsql reads the input file(s) in order, nodes
246+
first, then ways, then relations. This means that when the ways are read and
247+
processed, osm2pgsql can't know yet whether a way is in a relation (or in
248+
several). But for some use cases we need to know in which relations a way is
249+
and what the tags of these relations are or the roles of those member ways.
250+
The typical case are relations of type `route` (bus routes etc.) where we
251+
might want to render the `name` or `ref` from the route relation onto the
252+
way geometry.
253+
254+
The osm2pgsql flex backend supports this use case by adding an additional
255+
"reprocessing" step. Osm2pgsql will call the Lua function
256+
`osm2pgsql.select_relation_members()` for each added, modified, or deleted
257+
relation. Your job is to figure out which way members in that relation might
258+
need the information from the relation to be rendered correctly and return
259+
those ids in a Lua table with the only field 'ways'. This is usually done with
260+
a function like this:
248261

249-
You can look at `osm2pgsql.stage` to see in which stage you are.
262+
```
263+
function osm2pgsql.select_relation_members(relation)
264+
if relation.tags.type == 'route' then
265+
return { ways = osm2pgsql.way_member_ids(relation) }
266+
end
267+
end
268+
```
269+
270+
Instead of using the helper function `osm2pgsql.way_member_ids()` which
271+
returns the ids of all way members, you can write your own code, for instance
272+
if you want to check the roles.
273+
274+
Note that `select_relation_members()` is called for deleted relations and for
275+
the old version of a modified relation as well as for new relations and the
276+
new version of a modified relation. This is needed, for instance, to correctly
277+
mark member ways of deleted relations, because they need to be updated, too.
278+
The decision whether a way is to be marked or not can only be based on the
279+
tags of the relation and/or the roles of the members. If you take other
280+
information into account, updates might not work correctly.
250281

251-
In stage 1 you can only look at each OSM object on its own. You can see
252-
its id and tags (and possibly timestamp, changeset, user, etc.), but you don't
253-
know how this OSM objects relates to other OSM objects (for instance whether a
254-
way you are looking at is a member in a relation). If this is enough to decide
255-
in which database table(s) and with what data an OSM object should end up in,
256-
then you can process the OSM object in stage 1. If, on the other hand, you
257-
need some extra information, you have to defer processing to the second stage.
282+
In addition you have to store whatever information you need about the relation
283+
in your `process_relation()` function in a global variable.
284+
285+
After all relations are processed, osm2pgsql will reprocess all marked ways by
286+
calling the `process_way()` function for them again. This time around you have
287+
the information from the relation in the global variable and can use it.
288+
289+
If you don't mark any ways, nothing will be done in this reprocessing stage.
290+
291+
(It is currently not possible to mark nodes or relations. This might or might
292+
not be added in future versions of osm2pgsql.)
293+
294+
You can look at `osm2pgsql.stage` to see in which stage you are.
258295

259296
You want to do all the processing you can in stage 1, because it is faster
260-
and there is less memory overhead. For most use cases, stage 1 is enough. If
261-
it is not, use stage 1 to store information about OSM objects you will need
262-
in stage 2 in some global variable. In stage 2 you can read this information
263-
again and use it to decide where and how to store the data in the database.
297+
and there is less memory overhead. For most use cases, stage 1 is enough.
298+
299+
Processing in two stages can add quite a bit of overhead. Because this feature
300+
is new, there isn't much operational experience with it. So be a bit careful
301+
when you are experimenting and watch memory and disk space consumption and
302+
any extra time you are using. Keep in mind that:
303+
304+
* All data stored in stage 1 for use in stage 2 in your Lua script will use
305+
main memory.
306+
* Keeping track of way ids marked in stage 1 needs some memory.
307+
* To do the extra processing in stage 2, time is needed to get objects out
308+
of the object store and reprocess them.
309+
* Osm2pgsql will create an id index on all way tables to look up ways that
310+
need to be deleted and re-created in stage 2.
264311

265312
## Command line options
266313

docs/lua-lib.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,22 @@ if object.tags.highway then
3636
end
3737
```
3838

39+
## `way_member_ids`
40+
41+
Synopsis: `osm2pgsql.way_member_ids(RELATION)`
42+
43+
Description: Return an array table with the ids of all way members of RELATION.
44+
45+
Example:
46+
47+
```
48+
function osm2pgsql.select_relation_members(relation)
49+
if relation.tags.type == 'route' then
50+
return { ways = osm2pgsql.way_member_ids(relation) }
51+
end
52+
end
53+
```
54+
3955
## `make_clean_tags_func`
4056

4157
Synopsis: `osm2pgsql.make_clean_tags_func(KEYS)`

flex-config/route-relations.lua

Lines changed: 39 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,13 @@ tables.routes = osm2pgsql.define_relation_table('routes', {
2222
{ column = 'tags', type = 'hstore' },
2323
})
2424

25-
-- This will be used to store lists of relation ids queryable by way id
26-
by_way_id = {}
25+
-- This will be used to store information about relations queryable by member
26+
-- way id. It is a table of tables. The outer table is indexed by the way id,
27+
-- the inner table indexed by the relation id. This way even if the information
28+
-- about a relation is added twice, it will be in there only once. It is
29+
-- always good to write your osm2pgsql Lua code in an idempotent way, i.e.
30+
-- it can be called any number of times and will lead to the same result.
31+
local w2r = {}
2732

2833
function clean_tags(tags)
2934
tags.odbl = nil
@@ -40,54 +45,59 @@ function osm2pgsql.process_way(object)
4045
return
4146
end
4247

43-
-- In stage 1: Mark all remaining ways so we will see them again in stage 2
44-
if osm2pgsql.stage == 1 then
45-
osm2pgsql.mark_way(object.id)
46-
return
47-
end
48-
49-
-- We are now in stage 2
50-
5148
clean_tags(object.tags)
5249

53-
-- Data we will store in the "highways" table always has the way tags
50+
-- Data we will store in the "highways" table always has the tags from
51+
-- the way
5452
local row = {
5553
tags = object.tags
5654
}
5755

58-
-- If there is any data from relations, add it in
59-
local d = by_way_id[object.id]
56+
-- If there is any data from parent relations, add it in
57+
local d = w2r[object.id]
6058
if d then
61-
table.sort(d.refs)
62-
table.sort(d.ids)
63-
row.rel_refs = table.concat(d.refs, ',')
64-
row.rel_ids = '{' .. table.concat(d.ids, ',') .. '}'
59+
local refs = {}
60+
local ids = {}
61+
for rel_id, rel_ref in pairs(d) do
62+
refs[#refs + 1] = rel_ref
63+
ids[#ids + 1] = rel_id
64+
end
65+
table.sort(refs)
66+
table.sort(ids)
67+
row.rel_refs = table.concat(refs, ',')
68+
row.rel_ids = '{' .. table.concat(ids, ',') .. '}'
6569
end
6670

6771
tables.highways:add_row(row)
6872
end
6973

70-
function osm2pgsql.process_relation(object)
74+
-- This function is called for every added, modified, or deleted relation.
75+
-- Its only job is to return the ids of all member ways of the specified
76+
-- relation we want to see in stage 2 again. It MUST NOT store any information
77+
-- about the relation!
78+
function osm2pgsql.select_relation_members(relation)
7179
-- Only interested in relations with type=route, route=road and a ref
80+
if relation.tags.type == 'route' and relation.tags.route == 'road' and relation.tags.ref then
81+
return { ways = osm2pgsql.way_member_ids(relation) }
82+
end
83+
end
84+
85+
-- The process_relation() function should store all information about way
86+
-- members that might be needed in stage 2.
87+
function osm2pgsql.process_relation(object)
7288
if object.tags.type == 'route' and object.tags.route == 'road' and object.tags.ref then
7389
tables.routes:add_row({
74-
tags = object.tags,
75-
geom = { create = 'line' }
90+
tags = object.tags
7691
})
7792

78-
-- Go through all the members and store relation ids and refs so it
93+
-- Go through all the members and store relation ids and refs so they
7994
-- can be found by the way id.
8095
for _, member in ipairs(object.members) do
8196
if member.type == 'w' then
82-
if not by_way_id[member.ref] then
83-
by_way_id[member.ref] = {
84-
ids = {},
85-
refs = {}
86-
}
97+
if not w2r[member.ref] then
98+
w2r[member.ref] = {}
8799
end
88-
local d = by_way_id[member.ref]
89-
table.insert(d.ids, object.id)
90-
table.insert(d.refs, object.tags.ref)
100+
w2r[member.ref][object.id] = object.tags.ref
91101
end
92102
end
93103
end

src/init.lua

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,14 @@ function osm2pgsql.define_area_table(_name, _columns, _options)
2929
return _define_table_impl('area', _name, _columns, _options)
3030
end
3131

32-
function osm2pgsql.mark_way(id)
33-
return osm2pgsql.mark('w', id)
32+
function osm2pgsql.way_member_ids(relation)
33+
local ids = {}
34+
for _, member in ipairs(relation.members) do
35+
if member.type == 'w' then
36+
ids[#ids + 1] = member.ref
37+
end
38+
end
39+
return ids
3440
end
3541

3642
function osm2pgsql.clamp(value, low, high)

src/middle-pgsql.cpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -580,8 +580,6 @@ void middle_pgsql_t::commit()
580580
m_db_copy.sync();
581581
// release the copy thread and its query connection
582582
m_copy_thread->finish();
583-
584-
m_db_connection.close();
585583
}
586584

587585
void middle_pgsql_t::flush() { m_db_copy.sync(); }

src/osmdata.cpp

Lines changed: 40 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,10 @@ void osmdata_t::relation_modify(osmium::Relation const &rel) const
114114
{
115115
auto &slim = slim_middle();
116116

117+
for (auto &out : m_outs) {
118+
out->select_relation_members(rel.id());
119+
}
120+
117121
slim.relation_delete(rel.id());
118122
slim.relation_set(rel);
119123

@@ -223,6 +227,18 @@ class multithreaded_processor
223227
process_queue("relation", std::move(list), &output_t::pending_relation);
224228
}
225229

230+
/**
231+
* Process all relations in the list in stage1c.
232+
*
233+
* \param list List of relation ids to work on. The list is moved into the
234+
* function.
235+
*/
236+
void process_relations_stage1c(idlist_t &&list)
237+
{
238+
process_queue("relation", std::move(list),
239+
&output_t::pending_relation_stage1c);
240+
}
241+
226242
/**
227243
* Collect expiry tree information from all clones and merge it back
228244
* into the original outputs.
@@ -371,27 +387,41 @@ progress_display_t osmdata_t::process_file(osmium::io::File const &file,
371387
return handler.progress();
372388
}
373389

374-
void osmdata_t::process_stage1b() const
390+
void osmdata_t::process_dependents() const
375391
{
376-
if (m_dependency_manager->has_pending()) {
377-
multithreaded_processor proc{m_conninfo, m_mid, m_outs,
378-
(std::size_t)m_num_procs};
392+
multithreaded_processor proc{m_conninfo, m_mid, m_outs,
393+
(std::size_t)m_num_procs};
379394

395+
// stage 1b processing: process parents of changed objects
396+
if (m_dependency_manager->has_pending()) {
380397
proc.process_ways(m_dependency_manager->get_pending_way_ids());
381398
proc.process_relations(
382399
m_dependency_manager->get_pending_relation_ids());
383400
proc.merge_expire_trees();
384401
}
402+
403+
// stage 1c processing: mark parent relations of marked objects as changed
404+
for (auto &out : m_outs) {
405+
for (auto const id : out->get_marked_way_ids()) {
406+
m_dependency_manager->way_changed(id);
407+
}
408+
}
409+
410+
// process parent relations of marked ways
411+
if (m_dependency_manager->has_pending()) {
412+
proc.process_relations_stage1c(
413+
m_dependency_manager->get_pending_relation_ids());
414+
}
385415
}
386416

387-
void osmdata_t::process_stage2() const
417+
void osmdata_t::reprocess_marked() const
388418
{
389419
for (auto &out : m_outs) {
390-
out->stage2_proc();
420+
out->reprocess_marked();
391421
}
392422
}
393423

394-
void osmdata_t::process_stage3() const
424+
void osmdata_t::postprocess_database() const
395425
{
396426
// All the intensive parts of this are long-running PostgreSQL commands.
397427
// They will be run in a thread pool.
@@ -432,10 +462,10 @@ void osmdata_t::stop() const
432462
}
433463

434464
if (m_append) {
435-
process_stage1b();
465+
process_dependents();
436466
}
437467

438-
process_stage2();
468+
reprocess_marked();
439469

440-
process_stage3();
470+
postprocess_database();
441471
}

0 commit comments

Comments
 (0)