Skip to content

Commit 70b792b

Browse files
committed
Add a new middle database format
The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=FORMAT - 'legacy' (default) or 'new' * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. A new property "db_format" is written to the osm2pgsql_properties table with the value "0" (non-slim import), "1" (slim import with legacy format) or "2" (slim import with new format). This is read in append mode and handled appropriately. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes #692 Closes #1170 See #1502
1 parent c48168e commit 70b792b

File tree

17 files changed

+875
-75
lines changed

17 files changed

+875
-75
lines changed

.github/actions/ubuntu-prerequisites/action.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ runs:
2424
libpotrace-dev \
2525
libpq-dev \
2626
libproj-dev \
27+
nlohmann-json3-dev \
2728
pandoc \
2829
postgresql-${POSTGRESQL_VERSION} \
2930
postgresql-${POSTGRESQL_VERSION}-postgis-${POSTGIS_VERSION} \

.github/actions/win-install/action.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ runs:
1616
expat:x64-windows \
1717
libpq:x64-windows \
1818
lua:x64-windows \
19+
nlohmann-json:x64-windows \
1920
proj4:x64-windows \
2021
zlib:x64-windows
2122
shell: bash

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ jobs:
2121

2222
- name: Install prerequisites
2323
run: |
24-
brew install lua boost postgis pandoc cimg potrace
24+
brew install lua boost postgis pandoc cimg potrace nlohmann-json
2525
pip3 install psycopg2 behave osmium
2626
pg_ctl -D /usr/local/var/postgres init
2727
pg_ctl -D /usr/local/var/postgres start

.github/workflows/test-install.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ jobs:
4848
libpotrace-dev \
4949
libpq-dev \
5050
libproj-dev \
51+
nlohmann-json3-dev \
5152
lua${LUA_VERSION} \
5253
pandoc \
5354
postgresql-${POSTGRESQL_VERSION} \

CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,9 @@ include_directories(SYSTEM ${PostgreSQL_INCLUDE_DIRS})
201201

202202
find_package(Threads)
203203

204+
find_path(NLOHMANN_INCLUDE_DIR nlohmann/json.hpp)
205+
include_directories(SYSTEM ${NLOHMANN_INCLUDE_DIR})
206+
204207
find_path(POTRACE_INCLUDE_DIR potracelib.h)
205208
find_library(POTRACE_LIBRARY NAMES potrace)
206209

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Required libraries are
4949
* [zlib](https://www.zlib.net/)
5050
* [Boost libraries](https://www.boost.org/), including geometry, system and
5151
filesystem
52+
* [nlohmann/json](https://json.nlohmann.me/)
5253
* [CImg](https://cimg.eu/) (Optional, for generalization only)
5354
* [potrace](https://potrace.sourceforge.net/) (Optional, for generalization only)
5455
* [PostgreSQL](https://www.postgresql.org/) client libraries

src/command-line-parser.cpp

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ struct option const long_options[] = {
7272
{"merc", no_argument, nullptr, 'm'},
7373
{"middle-schema", required_argument, nullptr, 215},
7474
{"middle-way-node-index-id-shift", required_argument, nullptr, 300},
75+
{"middle-database-format", required_argument, nullptr, 301},
76+
{"middle-with-nodes", no_argument, nullptr, 302},
7577
{"multi-geometry", no_argument, nullptr, 'G'},
7678
{"number-processes", required_argument, nullptr, 205},
7779
{"output", required_argument, nullptr, 'O'},
@@ -450,6 +452,11 @@ static void check_options(options_t *options)
450452
throw std::runtime_error{"--drop only makes sense with --slim."};
451453
}
452454

455+
if (options->append && options->middle_database_format != 1) {
456+
throw std::runtime_error{
457+
"Do not use --middle-database-format with --append."};
458+
}
459+
453460
if (options->hstore_mode == hstore_column::none &&
454461
options->hstore_columns.empty() && options->hstore_match_only) {
455462
log_warn("--hstore-match-only only makes sense with --hstore, "
@@ -722,6 +729,20 @@ options_t parse_command_line(int argc, char *argv[])
722729
case 300: // --middle-way-node-index-id-shift
723730
options.way_node_index_id_shift = atoi(optarg);
724731
break;
732+
case 301: // --middle-database-format
733+
if (optarg == std::string{"legacy"}) {
734+
options.middle_database_format = 1;
735+
} else if (optarg == std::string{"new"}) {
736+
options.middle_database_format = 2;
737+
} else {
738+
throw std::runtime_error{
739+
"Unknown value for --middle-database-format (Use 'legacy' "
740+
"or 'new')."};
741+
}
742+
break;
743+
case 302: // --middle-with-nodes
744+
options.middle_with_nodes = true;
745+
break;
725746
case 400: // --log-level=LEVEL
726747
parse_log_level_param(optarg);
727748
break;
@@ -771,5 +792,9 @@ options_t parse_command_line(int argc, char *argv[])
771792

772793
options.conninfo = build_conninfo(database_options);
773794

795+
if (!options.slim) {
796+
options.middle_database_format = 0;
797+
}
798+
774799
return options;
775800
}

0 commit comments

Comments
 (0)