Skip to content

Add unaccent function based on PostgreSQL unaccent (Request in QGIS) #10579

@qgis-bot

Description

@qgis-bot

Request for documentation

From pull request qgis/QGIS#64136
Author: @tudorbarascu
QGIS version: 4.0

Add unaccent function based on PostgreSQL unaccent

PR Description:

As in Romania we use diacritics, I've always wanted to have unaccent support in QGIS and as I've been using the wonderful unaccent extension from PostgreSQL, I think it's going to be a very good asset to put that into QGIS.

I basically took the code from there and adapted it as well as I could and soaked it with tests.

I also put in all the tests that were done in PostgreSQL so that we have good testing coverage.

I have a closed stalled pull request (qgis/QGIS#62833) and I don't now how to reopen it so I did a new one.

So this is an upstream-compatible implementation of the PostgreSQL unaccent() function in QGIS to ensuree that QGIS unaccent expression behaves identically to PostgreSQL’s unaccent. I actually tested with a script all the mappings in the unaccent.rules so that QGIS behaves the same as in PostgreSQL by comparing the unaccent(x) return from QGIS with the return from a SELECT unaccent(x) from PostgreSQL just to be sure. I'm a little bit out of my depth here with all the cases and I needed to be sure.

PostgreSQL maintains a complete Unicode compatibility decomposition mapping through its unaccent.rules file, which is generated upstream. I made a scripts/generate_unaccent_rules.py script so that I can generate the src/core/qgsunaccent_generated_rules.cpp file containing a QHash<QString, QString> lookup table.

Hope it's all good, I tested manually from QGIS and from pyQGIS and all the tests went well (finally).
The logic is this, in the future, we can sync with PostgreSQL by downloading the https://github.com/postgres/postgres/blob/master/contrib/unaccent/unaccent.rules into the resources/unaccent.rules and then running the scripts/generate_unaccent_rules.py outputs a new qgsunaccent_generated_rules.cpp . This way we don't need to maintain a generator script for the rules as upstream (PostgreSQL) already do it at https://github.com/postgres/postgres/blob/master/contrib/unaccent/generate_unaccent_rules.py .

I really hope I got it good as it took me a lot of time to understand what was happening with a lot of special cases and I still don't have a deep understanding at what's happening with all the cases but I ensured that the output of the function in QGIS is the same as in PostgreSQL so I'm satisfied without digging a lot deeper.

Thanks for reviewing this. Don't know yet why I don't pass the clang tidy part. Sorry for the noise. I restructured the commits.

Commits tagged with [need-docs] or [FEATURE]

Metadata

Metadata

Assignees

No one assigned

    Labels

    4.0QGIS 4.0 new features

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions