Skip to content

Commit 5191ed1

Browse files
authored
Merge pull request #1215 from joto/fix-multistage-processing-take2
Fix multistage processing (take 2)
2 parents 9d7b577 + 4e64912 commit 5191ed1

File tree

13 files changed

+735
-145
lines changed

13 files changed

+735
-145
lines changed

docs/flex.md

Lines changed: 68 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,14 @@ The following functions are defined:
3636
* `osm2pgsql.define_table(options)`: Define a table. This is the more flexible
3737
function behind all the other `define_*_table()` functions. It gives you
3838
more control than the more convenient other functions.
39-
* `osm2pgsql.mark_way(id)`: Mark the OSM way with the specified id. This way
40-
will be processed (again) in stage 2.
4139

4240
You are expected to define one or more of the following functions:
4341

44-
* `osm2pgsql.process_node()`: Called for each node.
45-
* `osm2pgsql.process_way()`: Called for each way.
46-
* `osm2pgsql.process_relation()`: Called for each relation.
42+
* `osm2pgsql.process_node()`: Called for each new or changed node.
43+
* `osm2pgsql.process_way()`: Called for each new or changed way.
44+
* `osm2pgsql.process_relation()`: Called for each new or changed relation.
45+
* `osm2pgsql.select_relation_members()`: Called for each deleted or added
46+
relation. See below for more details.
4747

4848
Osm2pgsql also provides some additional functions in the
4949
[lua-lib.md](Lua helper library).
@@ -76,7 +76,7 @@ stored as is, relation ids will be stored as negative numbers.
7676
With the `osm2pgsql.define_table()` function you can also define tables that
7777
* don't have any ids, but those tables will never be updated by osm2pgsql
7878
* take *any OSM object*, in this case the type of object is stored in an
79-
additional column.
79+
additional `char(1)` column.
8080
* are in a specific PostgresSQL tablespace (set option `data_tablespace`) or
8181
that get their indexes created in a specific tablespace (set option
8282
`index_tablespace`).
@@ -242,25 +242,72 @@ a default transformation. These are the defaults:
242242

243243
## Stages
244244

245-
Osm2pgsql processes the data in up to two stages. You can mark ways in stage 1
246-
for processing in stage 2 by calling `osm2pgsql.mark_way(id)`. If you don't
247-
mark any ways, nothing will be done in stage 2.
245+
When processing OSM data, osm2pgsql reads the input file(s) in order, nodes
246+
first, then ways, then relations. This means that when the ways are read and
247+
processed, osm2pgsql can't know yet whether a way is in a relation (or in
248+
several). But for some use cases we need to know in which relations a way is
249+
and what the tags of these relations are or the roles of those member ways.
250+
The typical case are relations of type `route` (bus routes etc.) where we
251+
might want to render the `name` or `ref` from the route relation onto the
252+
way geometry.
253+
254+
The osm2pgsql flex backend supports this use case by adding an additional
255+
"reprocessing" step. Osm2pgsql will call the Lua function
256+
`osm2pgsql.select_relation_members()` for each added, modified, or deleted
257+
relation. Your job is to figure out which way members in that relation might
258+
need the information from the relation to be rendered correctly and return
259+
those ids in a Lua table with the only field 'ways'. This is usually done with
260+
a function like this:
248261

249-
You can look at `osm2pgsql.stage` to see in which stage you are.
262+
```
263+
function osm2pgsql.select_relation_members(relation)
264+
if relation.tags.type == 'route' then
265+
return { ways = osm2pgsql.way_member_ids(relation) }
266+
end
267+
end
268+
```
269+
270+
Instead of using the helper function `osm2pgsql.way_member_ids()` which
271+
returns the ids of all way members, you can write your own code, for instance
272+
if you want to check the roles.
273+
274+
Note that `select_relation_members()` is called for deleted relations and for
275+
the old version of a modified relation as well as for new relations and the
276+
new version of a modified relation. This is needed, for instance, to correctly
277+
mark member ways of deleted relations, because they need to be updated, too.
278+
The decision whether a way is to be marked or not can only be based on the
279+
tags of the relation and/or the roles of the members. If you take other
280+
information into account, updates might not work correctly.
250281

251-
In stage 1 you can only look at each OSM object on its own. You can see
252-
its id and tags (and possibly timestamp, changeset, user, etc.), but you don't
253-
know how this OSM objects relates to other OSM objects (for instance whether a
254-
way you are looking at is a member in a relation). If this is enough to decide
255-
in which database table(s) and with what data an OSM object should end up in,
256-
then you can process the OSM object in stage 1. If, on the other hand, you
257-
need some extra information, you have to defer processing to the second stage.
282+
In addition you have to store whatever information you need about the relation
283+
in your `process_relation()` function in a global variable.
284+
285+
After all relations are processed, osm2pgsql will reprocess all marked ways by
286+
calling the `process_way()` function for them again. This time around you have
287+
the information from the relation in the global variable and can use it.
288+
289+
If you don't mark any ways, nothing will be done in this reprocessing stage.
290+
291+
(It is currently not possible to mark nodes or relations. This might or might
292+
not be added in future versions of osm2pgsql.)
293+
294+
You can look at `osm2pgsql.stage` to see in which stage you are.
258295

259296
You want to do all the processing you can in stage 1, because it is faster
260-
and there is less memory overhead. For most use cases, stage 1 is enough. If
261-
it is not, use stage 1 to store information about OSM objects you will need
262-
in stage 2 in some global variable. In stage 2 you can read this information
263-
again and use it to decide where and how to store the data in the database.
297+
and there is less memory overhead. For most use cases, stage 1 is enough.
298+
299+
Processing in two stages can add quite a bit of overhead. Because this feature
300+
is new, there isn't much operational experience with it. So be a bit careful
301+
when you are experimenting and watch memory and disk space consumption and
302+
any extra time you are using. Keep in mind that:
303+
304+
* All data stored in stage 1 for use in stage 2 in your Lua script will use
305+
main memory.
306+
* Keeping track of way ids marked in stage 1 needs some memory.
307+
* To do the extra processing in stage 2, time is needed to get objects out
308+
of the object store and reprocess them.
309+
* Osm2pgsql will create an id index on all way tables to look up ways that
310+
need to be deleted and re-created in stage 2.
264311

265312
## Command line options
266313

docs/lua-lib.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,22 @@ if object.tags.highway then
3636
end
3737
```
3838

39+
## `way_member_ids`
40+
41+
Synopsis: `osm2pgsql.way_member_ids(RELATION)`
42+
43+
Description: Return an array table with the ids of all way members of RELATION.
44+
45+
Example:
46+
47+
```
48+
function osm2pgsql.select_relation_members(relation)
49+
if relation.tags.type == 'route' then
50+
return { ways = osm2pgsql.way_member_ids(relation) }
51+
end
52+
end
53+
```
54+
3955
## `make_clean_tags_func`
4056

4157
Synopsis: `osm2pgsql.make_clean_tags_func(KEYS)`

flex-config/route-relations.lua

Lines changed: 39 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,13 @@ tables.routes = osm2pgsql.define_relation_table('routes', {
2222
{ column = 'tags', type = 'hstore' },
2323
})
2424

25-
-- This will be used to store lists of relation ids queryable by way id
26-
by_way_id = {}
25+
-- This will be used to store information about relations queryable by member
26+
-- way id. It is a table of tables. The outer table is indexed by the way id,
27+
-- the inner table indexed by the relation id. This way even if the information
28+
-- about a relation is added twice, it will be in there only once. It is
29+
-- always good to write your osm2pgsql Lua code in an idempotent way, i.e.
30+
-- it can be called any number of times and will lead to the same result.
31+
local w2r = {}
2732

2833
function clean_tags(tags)
2934
tags.odbl = nil
@@ -40,54 +45,59 @@ function osm2pgsql.process_way(object)
4045
return
4146
end
4247

43-
-- In stage 1: Mark all remaining ways so we will see them again in stage 2
44-
if osm2pgsql.stage == 1 then
45-
osm2pgsql.mark_way(object.id)
46-
return
47-
end
48-
49-
-- We are now in stage 2
50-
5148
clean_tags(object.tags)
5249

53-
-- Data we will store in the "highways" table always has the way tags
50+
-- Data we will store in the "highways" table always has the tags from
51+
-- the way
5452
local row = {
5553
tags = object.tags
5654
}
5755

58-
-- If there is any data from relations, add it in
59-
local d = by_way_id[object.id]
56+
-- If there is any data from parent relations, add it in
57+
local d = w2r[object.id]
6058
if d then
61-
table.sort(d.refs)
62-
table.sort(d.ids)
63-
row.rel_refs = table.concat(d.refs, ',')
64-
row.rel_ids = '{' .. table.concat(d.ids, ',') .. '}'
59+
local refs = {}
60+
local ids = {}
61+
for rel_id, rel_ref in pairs(d) do
62+
refs[#refs + 1] = rel_ref
63+
ids[#ids + 1] = rel_id
64+
end
65+
table.sort(refs)
66+
table.sort(ids)
67+
row.rel_refs = table.concat(refs, ',')
68+
row.rel_ids = '{' .. table.concat(ids, ',') .. '}'
6569
end
6670

6771
tables.highways:add_row(row)
6872
end
6973

70-
function osm2pgsql.process_relation(object)
74+
-- This function is called for every added, modified, or deleted relation.
75+
-- Its only job is to return the ids of all member ways of the specified
76+
-- relation we want to see in stage 2 again. It MUST NOT store any information
77+
-- about the relation!
78+
function osm2pgsql.select_relation_members(relation)
7179
-- Only interested in relations with type=route, route=road and a ref
80+
if relation.tags.type == 'route' and relation.tags.route == 'road' and relation.tags.ref then
81+
return { ways = osm2pgsql.way_member_ids(relation) }
82+
end
83+
end
84+
85+
-- The process_relation() function should store all information about way
86+
-- members that might be needed in stage 2.
87+
function osm2pgsql.process_relation(object)
7288
if object.tags.type == 'route' and object.tags.route == 'road' and object.tags.ref then
7389
tables.routes:add_row({
74-
tags = object.tags,
75-
geom = { create = 'line' }
90+
tags = object.tags
7691
})
7792

78-
-- Go through all the members and store relation ids and refs so it
93+
-- Go through all the members and store relation ids and refs so they
7994
-- can be found by the way id.
8095
for _, member in ipairs(object.members) do
8196
if member.type == 'w' then
82-
if not by_way_id[member.ref] then
83-
by_way_id[member.ref] = {
84-
ids = {},
85-
refs = {}
86-
}
97+
if not w2r[member.ref] then
98+
w2r[member.ref] = {}
8799
end
88-
local d = by_way_id[member.ref]
89-
table.insert(d.ids, object.id)
90-
table.insert(d.refs, object.tags.ref)
100+
w2r[member.ref][object.id] = object.tags.ref
91101
end
92102
end
93103
end

src/init.lua

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,14 @@ function osm2pgsql.define_area_table(_name, _columns, _options)
2929
return _define_table_impl('area', _name, _columns, _options)
3030
end
3131

32-
function osm2pgsql.mark_way(id)
33-
return osm2pgsql.mark('w', id)
32+
function osm2pgsql.way_member_ids(relation)
33+
local ids = {}
34+
for _, member in ipairs(relation.members) do
35+
if member.type == 'w' then
36+
ids[#ids + 1] = member.ref
37+
end
38+
end
39+
return ids
3440
end
3541

3642
function osm2pgsql.clamp(value, low, high)

src/middle-pgsql.cpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -580,8 +580,6 @@ void middle_pgsql_t::commit()
580580
m_db_copy.sync();
581581
// release the copy thread and its query connection
582582
m_copy_thread->finish();
583-
584-
m_db_connection.close();
585583
}
586584

587585
void middle_pgsql_t::flush() { m_db_copy.sync(); }

src/osmdata.cpp

Lines changed: 40 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,10 @@ void osmdata_t::relation_modify(osmium::Relation const &rel) const
114114
{
115115
auto &slim = slim_middle();
116116

117+
for (auto &out : m_outs) {
118+
out->select_relation_members(rel.id());
119+
}
120+
117121
slim.relation_delete(rel.id());
118122
slim.relation_set(rel);
119123

@@ -223,6 +227,18 @@ class multithreaded_processor
223227
process_queue("relation", std::move(list), &output_t::pending_relation);
224228
}
225229

230+
/**
231+
* Process all relations in the list in stage1c.
232+
*
233+
* \param list List of relation ids to work on. The list is moved into the
234+
* function.
235+
*/
236+
void process_relations_stage1c(idlist_t &&list)
237+
{
238+
process_queue("relation", std::move(list),
239+
&output_t::pending_relation_stage1c);
240+
}
241+
226242
/**
227243
* Collect expiry tree information from all clones and merge it back
228244
* into the original outputs.
@@ -371,27 +387,41 @@ progress_display_t osmdata_t::process_file(osmium::io::File const &file,
371387
return handler.progress();
372388
}
373389

374-
void osmdata_t::process_stage1b() const
390+
void osmdata_t::process_dependents() const
375391
{
376-
if (m_dependency_manager->has_pending()) {
377-
multithreaded_processor proc{m_conninfo, m_mid, m_outs,
378-
(std::size_t)m_num_procs};
392+
multithreaded_processor proc{m_conninfo, m_mid, m_outs,
393+
(std::size_t)m_num_procs};
379394

395+
// stage 1b processing: process parents of changed objects
396+
if (m_dependency_manager->has_pending()) {
380397
proc.process_ways(m_dependency_manager->get_pending_way_ids());
381398
proc.process_relations(
382399
m_dependency_manager->get_pending_relation_ids());
383400
proc.merge_expire_trees();
384401
}
402+
403+
// stage 1c processing: mark parent relations of marked objects as changed
404+
for (auto &out : m_outs) {
405+
for (auto const id : out->get_marked_way_ids()) {
406+
m_dependency_manager->way_changed(id);
407+
}
408+
}
409+
410+
// process parent relations of marked ways
411+
if (m_dependency_manager->has_pending()) {
412+
proc.process_relations_stage1c(
413+
m_dependency_manager->get_pending_relation_ids());
414+
}
385415
}
386416

387-
void osmdata_t::process_stage2() const
417+
void osmdata_t::reprocess_marked() const
388418
{
389419
for (auto &out : m_outs) {
390-
out->stage2_proc();
420+
out->reprocess_marked();
391421
}
392422
}
393423

394-
void osmdata_t::process_stage3() const
424+
void osmdata_t::postprocess_database() const
395425
{
396426
// All the intensive parts of this are long-running PostgreSQL commands.
397427
// They will be run in a thread pool.
@@ -432,10 +462,10 @@ void osmdata_t::stop() const
432462
}
433463

434464
if (m_append) {
435-
process_stage1b();
465+
process_dependents();
436466
}
437467

438-
process_stage2();
468+
reprocess_marked();
439469

440-
process_stage3();
470+
postprocess_database();
441471
}

0 commit comments

Comments
 (0)