@@ -36,14 +36,14 @@ The following functions are defined:
3636* ` osm2pgsql.define_table(options) ` : Define a table. This is the more flexible
3737 function behind all the other ` define_*_table() ` functions. It gives you
3838 more control than the more convenient other functions.
39- * ` osm2pgsql.mark_way(id) ` : Mark the OSM way with the specified id. This way
40- will be processed (again) in stage 2.
4139
4240You are expected to define one or more of the following functions:
4341
44- * ` osm2pgsql.process_node() ` : Called for each node.
45- * ` osm2pgsql.process_way() ` : Called for each way.
46- * ` osm2pgsql.process_relation() ` : Called for each relation.
42+ * ` osm2pgsql.process_node() ` : Called for each new or changed node.
43+ * ` osm2pgsql.process_way() ` : Called for each new or changed way.
44+ * ` osm2pgsql.process_relation() ` : Called for each new or changed relation.
45+ * ` osm2pgsql.select_relation_members() ` : Called for each deleted or added
46+ relation. See below for more details.
4747
4848Osm2pgsql also provides some additional functions in the
4949[ lua-lib.md] (Lua helper library).
@@ -76,7 +76,7 @@ stored as is, relation ids will be stored as negative numbers.
7676With the ` osm2pgsql.define_table() ` function you can also define tables that
7777* don't have any ids, but those tables will never be updated by osm2pgsql
7878* take * any OSM object* , in this case the type of object is stored in an
79- additional column.
79+ additional ` char(1) ` column.
8080* are in a specific PostgresSQL tablespace (set option ` data_tablespace ` ) or
8181 that get their indexes created in a specific tablespace (set option
8282 ` index_tablespace ` ).
@@ -242,25 +242,72 @@ a default transformation. These are the defaults:
242242
243243## Stages
244244
245- Osm2pgsql processes the data in up to two stages. You can mark ways in stage 1
246- for processing in stage 2 by calling ` osm2pgsql.mark_way(id) ` . If you don't
247- mark any ways, nothing will be done in stage 2.
245+ When processing OSM data, osm2pgsql reads the input file(s) in order, nodes
246+ first, then ways, then relations. This means that when the ways are read and
247+ processed, osm2pgsql can't know yet whether a way is in a relation (or in
248+ several). But for some use cases we need to know in which relations a way is
249+ and what the tags of these relations are or the roles of those member ways.
250+ The typical case are relations of type ` route ` (bus routes etc.) where we
251+ might want to render the ` name ` or ` ref ` from the route relation onto the
252+ way geometry.
253+
254+ The osm2pgsql flex backend supports this use case by adding an additional
255+ "reprocessing" step. Osm2pgsql will call the Lua function
256+ ` osm2pgsql.select_relation_members() ` for each added, modified, or deleted
257+ relation. Your job is to figure out which way members in that relation might
258+ need the information from the relation to be rendered correctly and return
259+ those ids in a Lua table with the only field 'ways'. This is usually done with
260+ a function like this:
248261
249- You can look at ` osm2pgsql.stage ` to see in which stage you are.
262+ ```
263+ function osm2pgsql.select_relation_members(relation)
264+ if relation.tags.type == 'route' then
265+ return { ways = osm2pgsql.way_member_ids(relation) }
266+ end
267+ end
268+ ```
269+
270+ Instead of using the helper function ` osm2pgsql.way_member_ids() ` which
271+ returns the ids of all way members, you can write your own code, for instance
272+ if you want to check the roles.
273+
274+ Note that ` select_relation_members() ` is called for deleted relations and for
275+ the old version of a modified relation as well as for new relations and the
276+ new version of a modified relation. This is needed, for instance, to correctly
277+ mark member ways of deleted relations, because they need to be updated, too.
278+ The decision whether a way is to be marked or not can only be based on the
279+ tags of the relation and/or the roles of the members. If you take other
280+ information into account, updates might not work correctly.
250281
251- In stage 1 you can only look at each OSM object on its own. You can see
252- its id and tags (and possibly timestamp, changeset, user, etc.), but you don't
253- know how this OSM objects relates to other OSM objects (for instance whether a
254- way you are looking at is a member in a relation). If this is enough to decide
255- in which database table(s) and with what data an OSM object should end up in,
256- then you can process the OSM object in stage 1. If, on the other hand, you
257- need some extra information, you have to defer processing to the second stage.
282+ In addition you have to store whatever information you need about the relation
283+ in your ` process_relation() ` function in a global variable.
284+
285+ After all relations are processed, osm2pgsql will reprocess all marked ways by
286+ calling the ` process_way() ` function for them again. This time around you have
287+ the information from the relation in the global variable and can use it.
288+
289+ If you don't mark any ways, nothing will be done in this reprocessing stage.
290+
291+ (It is currently not possible to mark nodes or relations. This might or might
292+ not be added in future versions of osm2pgsql.)
293+
294+ You can look at ` osm2pgsql.stage ` to see in which stage you are.
258295
259296You want to do all the processing you can in stage 1, because it is faster
260- and there is less memory overhead. For most use cases, stage 1 is enough. If
261- it is not, use stage 1 to store information about OSM objects you will need
262- in stage 2 in some global variable. In stage 2 you can read this information
263- again and use it to decide where and how to store the data in the database.
297+ and there is less memory overhead. For most use cases, stage 1 is enough.
298+
299+ Processing in two stages can add quite a bit of overhead. Because this feature
300+ is new, there isn't much operational experience with it. So be a bit careful
301+ when you are experimenting and watch memory and disk space consumption and
302+ any extra time you are using. Keep in mind that:
303+
304+ * All data stored in stage 1 for use in stage 2 in your Lua script will use
305+ main memory.
306+ * Keeping track of way ids marked in stage 1 needs some memory.
307+ * To do the extra processing in stage 2, time is needed to get objects out
308+ of the object store and reprocess them.
309+ * Osm2pgsql will create an id index on all way tables to look up ways that
310+ need to be deleted and re-created in stage 2.
264311
265312## Command line options
266313
0 commit comments