Some thoughts about the future of this project

While this project worked reasonably well for producing raster tiles on German Tileserver in recent years and has also been used by [OpenMapTiles](https://github.com/openmaptiles/) the current approach showed its limitations recently.

## 1st Issue

The project is currently using three FOSS transcription libraries. These are:

* ICU for general purpose transcription, http://site.icu-project.org/, written in C++
* Kakasi for Kanji transcription, http://kakasi.namazu.org/, written in C
* tltk for Thai language, written in Python

Back in November Issue #35 was filed and another library ``pinyin_jyutping_sentence`` written in python has been found for this purpose.

As we had already noticed with ``tltk``a general problem of python shared procedures seemed to be, that imported modules are not persistent between connections in PostgreSQL. Thus imports which take a long time (15 seconds in case of ``pinyin_jyutping_sentence`` on my desktop computer and 5 seconds for tltk) will do this not only once but any time a new connection is made to the database.

## 2nd issue

Also back in November @chatelao added a couple more regular expressions for languages we did not have code for generating street abbreviations yet. Unfortunately this code also slowed down execution time of the functions by an order of magnitude as @otbutz reported in issue #40 

## Conclusion

Given the fact, that importing OSM data to PostgreSQL is in practise done by osm2pgsql or imposm only, those tools might be a better place to do localization. Currently (AFAIK) just osm2pgsql has the ability to do tag-transformations in import stage, but up till recently had a very static table setup of the target database. Fortunately this just changed with the advent of its new [flex backend](https://blog.jochentopf.com/2020-05-10-new-flex-output-in-osm2pgsql.html).

Basically this is why I think we should move the l10n stage to the PostgreSQL import stage instead of keeping it as stored procedures.

Another thing is the actual implementation of Latin transcription. I have a prove of concept implementation which will do this, keeping the approach of the current project for now available at https://github.com/giggls/osml10n.

The idea is to have an external daemon which does the actual transcription while either a script run during import time from osm2pgsql or a replacement for the current ``osml10n_cc_translit`` function will connect to this daemon to get a transliterated version of a given string.

This will at least resolve the problem of slow transcription procedures written in python. However, the regular expression problem might only be resolvable by moving this stuff into a (lua-)script running from osm2pgsql.

I am looking forward to your comments.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some thoughts about the future of this project #53

1st Issue

2nd issue

Conclusion

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some thoughts about the future of this project #53

Description

1st Issue

2nd issue

Conclusion

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions