osm2pgsql-dev
diff --git a/‎docs/flex.md‎
Lines changed: 68 additions & 21 deletions b/‎docs/flex.md‎
Lines changed: 68 additions & 21 deletions
diff --git a/‎docs/lua-lib.md‎
Lines changed: 16 additions & 0 deletions b/‎docs/lua-lib.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎flex-config/route-relations.lua‎
Lines changed: 39 additions & 29 deletions b/‎flex-config/route-relations.lua‎
Lines changed: 39 additions & 29 deletions
diff --git a/‎src/init.lua‎
Lines changed: 8 additions & 2 deletions b/‎src/init.lua‎
Lines changed: 8 additions & 2 deletions
diff --git a/‎src/middle-pgsql.cpp‎
Lines changed: 0 additions & 2 deletions b/‎src/middle-pgsql.cpp‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎src/osmdata.cpp‎
Lines changed: 40 additions & 10 deletions b/‎src/osmdata.cpp‎
Lines changed: 40 additions & 10 deletions
@@ -36,14 +36,14 @@ The following functions are defined:
 * `osm2pgsql.define_table(options)`: Define a table. This is the more flexible
   function behind all the other `define_*_table()` functions. It gives you
   more control than the more convenient other functions.
-* `osm2pgsql.mark_way(id)`: Mark the OSM way with the specified id. This way
-  will be processed (again) in stage 2.
 
 You are expected to define one or more of the following functions:
 
-* `osm2pgsql.process_node()`: Called for each node.
-* `osm2pgsql.process_way()`: Called for each way.
-* `osm2pgsql.process_relation()`: Called for each relation.
+* `osm2pgsql.process_node()`: Called for each new or changed node.
+* `osm2pgsql.process_way()`: Called for each new or changed way.
+* `osm2pgsql.process_relation()`: Called for each new or changed relation.
+* `osm2pgsql.select_relation_members()`: Called for each deleted or added
+  relation. See below for more details.
 
 Osm2pgsql also provides some additional functions in the
 [lua-lib.md](Lua helper library).
@@ -76,7 +76,7 @@ stored as is, relation ids will be stored as negative numbers.
 With the `osm2pgsql.define_table()` function you can also define tables that
 * don't have any ids, but those tables will never be updated by osm2pgsql
 * take *any OSM object*, in this case the type of object is stored in an
-  additional column.
+  additional `char(1)` column.
 * are in a specific PostgresSQL tablespace (set option `data_tablespace`) or
   that get their indexes created in a specific tablespace (set option
   `index_tablespace`).
@@ -242,25 +242,72 @@ a default transformation. These are the defaults:
 
 ## Stages
 
-Osm2pgsql processes the data in up to two stages. You can mark ways in stage 1
-for processing in stage 2 by calling `osm2pgsql.mark_way(id)`. If you don't
-mark any ways, nothing will be done in stage 2.
+When processing OSM data, osm2pgsql reads the input file(s) in order, nodes
+first, then ways, then relations. This means that when the ways are read and
+processed, osm2pgsql can't know yet whether a way is in a relation (or in
+several). But for some use cases we need to know in which relations a way is
+and what the tags of these relations are or the roles of those member ways.
+The typical case are relations of type `route` (bus routes etc.) where we
+might want to render the `name` or `ref` from the route relation onto the
+way geometry.
+
+The osm2pgsql flex backend supports this use case by adding an additional
+"reprocessing" step. Osm2pgsql will call the Lua function
+`osm2pgsql.select_relation_members()` for each added, modified, or deleted
+relation. Your job is to figure out which way members in that relation might
+need the information from the relation to be rendered correctly and return
+those ids in a Lua table with the only field 'ways'. This is usually done with
+a function like this:
 
-You can look at `osm2pgsql.stage` to see in which stage you are.
+```
+function osm2pgsql.select_relation_members(relation)
+    if relation.tags.type == 'route' then
+        return { ways = osm2pgsql.way_member_ids(relation) }
+    end
+end
+```
+
+Instead of using the helper function `osm2pgsql.way_member_ids()` which
+returns the ids of all way members, you can write your own code, for instance
+if you want to check the roles.
+
+Note that `select_relation_members()` is called for deleted relations and for
+the old version of a modified relation as well as for new relations and the
+new version of a modified relation. This is needed, for instance, to correctly
+mark member ways of deleted relations, because they need to be updated, too.
+The decision whether a way is to be marked or not can only be based on the
+tags of the relation and/or the roles of the members. If you take other
+information into account, updates might not work correctly.
 
-In stage 1 you can only look at each OSM object on its own. You can see
-its id and tags (and possibly timestamp, changeset, user, etc.), but you don't
-know how this OSM objects relates to other OSM objects (for instance whether a
-way you are looking at is a member in a relation). If this is enough to decide
-in which database table(s) and with what data an OSM object should end up in,
-then you can process the OSM object in stage 1. If, on the other hand, you
-need some extra information, you have to defer processing to the second stage.
+In addition you have to store whatever information you need about the relation
+in your `process_relation()` function in a global variable.
+
+After all relations are processed, osm2pgsql will reprocess all marked ways by
+calling the `process_way()` function for them again. This time around you have
+the information from the relation in the global variable and can use it.
+
+If you don't mark any ways, nothing will be done in this reprocessing stage.
+
+(It is currently not possible to mark nodes or relations. This might or might
+not be added in future versions of osm2pgsql.)
+
+You can look at `osm2pgsql.stage` to see in which stage you are.
 
 You want to do all the processing you can in stage 1, because it is faster
-and there is less memory overhead. For most use cases, stage 1 is enough. If
-it is not, use stage 1 to store information about OSM objects you will need
-in stage 2 in some global variable. In stage 2 you can read this information
-again and use it to decide where and how to store the data in the database.
+and there is less memory overhead. For most use cases, stage 1 is enough.
+
+Processing in two stages can add quite a bit of overhead. Because this feature
+is new, there isn't much operational experience with it. So be a bit careful
+when you are experimenting and watch memory and disk space consumption and
+any extra time you are using. Keep in mind that:
+
+* All data stored in stage 1 for use in stage 2 in your Lua script will use
+  main memory.
+* Keeping track of way ids marked in stage 1 needs some memory.
+* To do the extra processing in stage 2, time is needed to get objects out
+  of the object store and reprocess them.
+* Osm2pgsql will create an id index on all way tables to look up ways that
+  need to be deleted and re-created in stage 2.
 
 ## Command line options
 
 
@@ -36,6 +36,22 @@ if object.tags.highway then
 end
 ```
 
+## `way_member_ids`
+
+Synopsis: `osm2pgsql.way_member_ids(RELATION)`
+
+Description: Return an array table with the ids of all way members of RELATION.
+
+Example:
+
+```
+function osm2pgsql.select_relation_members(relation)
+    if relation.tags.type == 'route' then
+        return { ways = osm2pgsql.way_member_ids(relation) }
+    end
+end
+```
+
 ## `make_clean_tags_func`
 
 Synopsis: `osm2pgsql.make_clean_tags_func(KEYS)`
 
@@ -22,8 +22,13 @@ tables.routes = osm2pgsql.define_relation_table('routes', {
     { column = 'tags', type = 'hstore' },
 })
 
--- This will be used to store lists of relation ids queryable by way id
-by_way_id = {}
+-- This will be used to store information about relations queryable by member
+-- way id. It is a table of tables. The outer table is indexed by the way id,
+-- the inner table indexed by the relation id. This way even if the information
+-- about a relation is added twice, it will be in there only once. It is
+-- always good to write your osm2pgsql Lua code in an idempotent way, i.e.
+-- it can be called any number of times and will lead to the same result.
+local w2r = {}
 
 function clean_tags(tags)
     tags.odbl = nil
@@ -40,54 +45,59 @@ function osm2pgsql.process_way(object)
         return
     end
 
-    -- In stage 1: Mark all remaining ways so we will see them again in stage 2
-    if osm2pgsql.stage == 1 then
-        osm2pgsql.mark_way(object.id)
-        return
-    end
-
-    -- We are now in stage 2
-
     clean_tags(object.tags)
 
-    -- Data we will store in the "highways" table always has the way tags
+    -- Data we will store in the "highways" table always has the tags from
+    -- the way
     local row = {
         tags = object.tags
     }
 
-    -- If there is any data from relations, add it in
-    local d = by_way_id[object.id]
+    -- If there is any data from parent relations, add it in
+    local d = w2r[object.id]
     if d then
-        table.sort(d.refs)
-        table.sort(d.ids)
-        row.rel_refs = table.concat(d.refs, ',')
-        row.rel_ids = '{' .. table.concat(d.ids, ',') .. '}'
+        local refs = {}
+        local ids = {}
+        for rel_id, rel_ref in pairs(d) do
+            refs[#refs + 1] = rel_ref
+            ids[#ids + 1] = rel_id
+        end
+        table.sort(refs)
+        table.sort(ids)
+        row.rel_refs = table.concat(refs, ',')
+        row.rel_ids = '{' .. table.concat(ids, ',') .. '}'
     end
 
     tables.highways:add_row(row)
 end
 
-function osm2pgsql.process_relation(object)
+-- This function is called for every added, modified, or deleted relation.
+-- Its only job is to return the ids of all member ways of the specified
+-- relation we want to see in stage 2 again. It MUST NOT store any information
+-- about the relation!
+function osm2pgsql.select_relation_members(relation)
     -- Only interested in relations with type=route, route=road and a ref
+    if relation.tags.type == 'route' and relation.tags.route == 'road' and relation.tags.ref then
+        return { ways = osm2pgsql.way_member_ids(relation) }
+    end
+end
+
+-- The process_relation() function should store all information about way
+-- members that might be needed in stage 2.
+function osm2pgsql.process_relation(object)
     if object.tags.type == 'route' and object.tags.route == 'road' and object.tags.ref then
         tables.routes:add_row({
-            tags = object.tags,
-            geom = { create = 'line' }
+            tags = object.tags
         })
 
-        -- Go through all the members and store relation ids and refs so it
+        -- Go through all the members and store relation ids and refs so they
         -- can be found by the way id.
         for _, member in ipairs(object.members) do
             if member.type == 'w' then
-                if not by_way_id[member.ref] then
-                    by_way_id[member.ref] = {
-                        ids = {},
-                        refs = {}
-                    }
+                if not w2r[member.ref] then
+                    w2r[member.ref] = {}
                 end
-                local d = by_way_id[member.ref]
-                table.insert(d.ids, object.id)
-                table.insert(d.refs, object.tags.ref)
+                w2r[member.ref][object.id] = object.tags.ref
             end
         end
     end
 
@@ -29,8 +29,14 @@ function osm2pgsql.define_area_table(_name, _columns, _options)
     return _define_table_impl('area', _name, _columns, _options)
 end
 
-function osm2pgsql.mark_way(id)
-    return osm2pgsql.mark('w', id)
+function osm2pgsql.way_member_ids(relation)
+    local ids = {}
+    for _, member in ipairs(relation.members) do
+        if member.type == 'w' then
+            ids[#ids + 1] = member.ref
+        end
+    end
+    return ids
 end
 
 function osm2pgsql.clamp(value, low, high)
 
@@ -580,8 +580,6 @@ void middle_pgsql_t::commit()
     m_db_copy.sync();
     // release the copy thread and its query connection
     m_copy_thread->finish();
-
-    m_db_connection.close();
 }
 
 void middle_pgsql_t::flush() { m_db_copy.sync(); }
 
@@ -114,6 +114,10 @@ void osmdata_t::relation_modify(osmium::Relation const &rel) const
 {
     auto &slim = slim_middle();
 
+    for (auto &out : m_outs) {
+        out->select_relation_members(rel.id());
+    }
+
     slim.relation_delete(rel.id());
     slim.relation_set(rel);
 
@@ -223,6 +227,18 @@ class multithreaded_processor
         process_queue("relation", std::move(list), &output_t::pending_relation);
     }
 
+    /**
+     * Process all relations in the list in stage1c.
+     *
+     * \param list List of relation ids to work on. The list is moved into the
+     *             function.
+     */
+    void process_relations_stage1c(idlist_t &&list)
+    {
+        process_queue("relation", std::move(list),
+                      &output_t::pending_relation_stage1c);
+    }
+
     /**
      * Collect expiry tree information from all clones and merge it back
      * into the original outputs.
@@ -371,27 +387,41 @@ progress_display_t osmdata_t::process_file(osmium::io::File const &file,
     return handler.progress();
 }
 
-void osmdata_t::process_stage1b() const
+void osmdata_t::process_dependents() const
 {
-    if (m_dependency_manager->has_pending()) {
-        multithreaded_processor proc{m_conninfo, m_mid, m_outs,
-                                     (std::size_t)m_num_procs};
+    multithreaded_processor proc{m_conninfo, m_mid, m_outs,
+                                 (std::size_t)m_num_procs};
 
+    // stage 1b processing: process parents of changed objects
+    if (m_dependency_manager->has_pending()) {
         proc.process_ways(m_dependency_manager->get_pending_way_ids());
         proc.process_relations(
             m_dependency_manager->get_pending_relation_ids());
         proc.merge_expire_trees();
     }
+
+    // stage 1c processing: mark parent relations of marked objects as changed
+    for (auto &out : m_outs) {
+        for (auto const id : out->get_marked_way_ids()) {
+            m_dependency_manager->way_changed(id);
+        }
+    }
+
+    // process parent relations of marked ways
+    if (m_dependency_manager->has_pending()) {
+        proc.process_relations_stage1c(
+            m_dependency_manager->get_pending_relation_ids());
+    }
 }
 
-void osmdata_t::process_stage2() const
+void osmdata_t::reprocess_marked() const
 {
     for (auto &out : m_outs) {
-        out->stage2_proc();
+        out->reprocess_marked();
     }
 }
 
-void osmdata_t::process_stage3() const
+void osmdata_t::postprocess_database() const
 {
     // All the intensive parts of this are long-running PostgreSQL commands.
     // They will be run in a thread pool.
@@ -432,10 +462,10 @@ void osmdata_t::stop() const
     }
 
     if (m_append) {
-        process_stage1b();
+        process_dependents();
     }
 
-    process_stage2();
+    reprocess_marked();
 
-    process_stage3();
+    postprocess_database();
 }
Original file line number	Diff line number	Diff line change
`@@ -580,8 +580,6 @@ void middle_pgsql_t::commit()`
`580`	`580`	`m_db_copy.sync();`
`581`	`581`	`// release the copy thread and its query connection`
`582`	`582`	`m_copy_thread->finish();`
`583`		`-`
`584`		`- m_db_connection.close();`
`585`	`583`	`}`
`586`	`584`
`587`	`585`	`void middle_pgsql_t::flush() { m_db_copy.sync(); }`