|
1 | | -# address formatting |
2 | | - |
3 | | -### Overview |
4 | | - |
5 | | -This project contains templates and test cases for address formats used in territories around the world. The templates can then be processed in any programming language ([see below for list of processors](#processing-logic)). |
6 | | - |
7 | | -### Build Status |
| 1 | +# Address Formatting |
8 | 2 |
|
9 | 3 | [](https://github.com/OpenCageData/address-formatting/actions/workflows/ci.yml) |
10 | 4 |
|
11 | | -### An example: |
| 5 | +Templates and test cases for address formats used in territories around the world. The templates can be processed in any programming language ([see list of processors](#processing-logic)). |
12 | 6 |
|
13 | | -Given a set of address parts like |
| 7 | +## Example |
14 | 8 |
|
15 | | - house_number: 17 |
16 | | - road: Rue du Médecin-Colonel Calbairac |
17 | | - neighbourhood: Lafourguette |
18 | | - suburb: Toulouse Ouest |
19 | | - postcode: 31000 |
20 | | - city: Toulouse |
21 | | - county: Toulouse |
22 | | - state: Midi-Pyrénées |
23 | | - country: France |
24 | | - country_code: FR |
| 9 | +Given a set of address parts: |
25 | 10 |
|
26 | | -we want to write logic to compile an address in the format consumers expect |
| 11 | +```yaml |
| 12 | +house_number: 17 |
| 13 | +road: Rue du Médecin-Colonel Calbairac |
| 14 | +neighbourhood: Lafourguette |
| 15 | +suburb: Toulouse Ouest |
| 16 | +postcode: 31000 |
| 17 | +city: Toulouse |
| 18 | +county: Toulouse |
| 19 | +state: Midi-Pyrénées |
| 20 | +country: France |
| 21 | +country_code: FR |
| 22 | +``` |
27 | 23 |
|
28 | | - 17 Rue du Médecin-Colonel Calbairac |
29 | | - 31000 Toulouse |
30 | | - France |
| 24 | +We want to compile an address in the format consumers expect: |
31 | 25 |
|
32 | | -### Why would you want to do this? |
| 26 | +``` |
| 27 | +17 Rue du Médecin-Colonel Calbairac |
| 28 | +31000 Toulouse |
| 29 | +France |
| 30 | +``` |
33 | 31 |
|
34 | | -The intended use case is database or geocoding systems (forward, reverse, autocomplete) where we know both the country of the address and the language of the user/reader. The address is displayed to a consumer (for example in an app) and not used to print on an envelope for actual postal delivery. We use it to format output from the [OpenCage Geocoding API](https://opencagedata.com/api). |
| 32 | +## Why Use This? |
35 | 33 |
|
36 | | -### Which addresses are we talking about? |
| 34 | +The intended use case is database or geocoding systems (forward, reverse, autocomplete) where we know both the country of the address and the language of the user/reader. The address is displayed to a consumer (for example in an app) and not used to print on an envelope for actual postal delivery. We use it to format output from the [OpenCage Geocoding API](https://opencagedata.com/api). |
37 | 35 |
|
38 | | -We have to deal with |
| 36 | +## Scope |
39 | 37 |
|
40 | | - * incomplete data |
41 | | - * anything with a name (peaks, bridges, bus stops) |
| 38 | +**What we handle:** |
| 39 | +- Incomplete data |
| 40 | +- Anything with a name (peaks, bridges, bus stops) |
42 | 41 |
|
43 | | -Unlike [physical post (office) mail](http://www.bitboost.com/ref/international-address-formats.html) we don't have to deal with |
| 42 | +**What we don't handle** (unlike [physical postal mail](http://www.bitboost.com/ref/international-address-formats.html)): |
| 43 | +- Apartment/flat numbers, floor numbers |
| 44 | +- PO boxes |
| 45 | +- Translating the destination address language (whatever language is input is output) |
44 | 46 |
|
45 | | - * apartment/flat number, floor numbers |
46 | | - * PO boxes |
47 | | - * translating the language of the (destination) address. Whatever language is input is output. |
48 | | - |
49 | | -### Processing logic |
| 47 | +## Processing Logic |
50 | 48 |
|
51 | | -Our goal with this repository is a series of (programming) language independent templates. Those templates can then be processed by whatever software you like. |
| 49 | +Our goal is a series of programming language-independent templates that can be processed by whatever software you like. |
52 | 50 |
|
53 | | -There are open-source implementations in |
| 51 | +### Open-Source Implementations |
54 | 52 |
|
55 | | - * [Android library](https://github.com/woheller69/AndroidAddressFormatter) |
56 | | - * [Elixir](https://github.com/dkuku/ex_address_formatting) |
57 | | - * [Go](https://github.com/timonmasberg/address-formatter) |
58 | | - * [Java](https://github.com/placemarkt/address-formatter-java) |
59 | | - * [Javascript](https://github.com/fragaria/address-formatter) |
60 | | - * [Kotlin](https://github.com/bettermile/address-formatter-kotlin) |
61 | | - * [Perl](https://metacpan.org/release/Geo-Address-Formatter) |
62 | | - * [PHP](https://github.com/predicthq/address-formatter-php) |
63 | | - * [PowerShell](https://github.com/GruberMarkus/AddressFormatter) cross-platform |
64 | | - * [Python (no longer maintained)](https://github.com/pudo/addressformatting/tree/master) |
65 | | - * [Ruby](https://github.com/mirubiri/address_composer) |
66 | | - * [Rust (no longer maintained)](https://github.com/antoine-de/address-formatter-rs) |
67 | | - * [Scala](https://github.com/ben-willis/address-formatter) |
| 53 | +| Language | Repository | Notes | |
| 54 | +|----------|------------|-------| |
| 55 | +| Android | [AndroidAddressFormatter](https://github.com/woheller69/AndroidAddressFormatter) | | |
| 56 | +| Elixir | [ex_address_formatting](https://github.com/dkuku/ex_address_formatting) | | |
| 57 | +| Go | [address-formatter](https://github.com/timonmasberg/address-formatter) | | |
| 58 | +| Java | [address-formatter-java](https://github.com/placemarkt/address-formatter-java) | | |
| 59 | +| JavaScript | [address-formatter](https://github.com/fragaria/address-formatter) | | |
| 60 | +| Kotlin | [address-formatter-kotlin](https://github.com/bettermile/address-formatter-kotlin) | | |
| 61 | +| Perl | [Geo-Address-Formatter](https://metacpan.org/release/Geo-Address-Formatter) | | |
| 62 | +| PHP | [address-formatter-php](https://github.com/predicthq/address-formatter-php) | | |
| 63 | +| PowerShell | [AddressFormatter](https://github.com/GruberMarkus/AddressFormatter) | Cross-platform | |
| 64 | +| Python | [addressformatting](https://github.com/pudo/addressformatting/tree/master) | No longer maintained | |
| 65 | +| Ruby | [address_composer](https://github.com/mirubiri/address_composer) | | |
| 66 | +| Rust | [address-formatter-rs](https://github.com/antoine-de/address-formatter-rs) | No longer maintained | |
| 67 | +| Scala | [address-formatter](https://github.com/ben-willis/address-formatter) | | |
68 | 68 |
|
69 | | -We would love more language implementations. The more people who use the templates, the more likely bugs will be reported. |
| 69 | +We welcome more language implementations. The more people who use the templates, the more likely bugs will be reported. |
70 | 70 |
|
71 | | -If you write a processor, please submit a pull request adding your processor to the list. |
| 71 | +**If you write a processor**, please submit a pull request adding it to the list. Include this repo as a [git submodule](https://git-scm.com/book/en/v2/Git-Tools-Submodules) so we all use the same templates/configuration and stay in sync. See [how we do it in the Perl parser](https://github.com/OpenCageData/perl-Geo-Address-Formatter/blob/master/README.md#installation) for an example. |
72 | 72 |
|
73 | | -One key point: please include this repo as a [git submodule](https://git-scm.com/book/en/v2/Git-Tools-Submodules), so we all use the same templates/configuration and don't get out of sync. if you are unfamiliar with git submodules, please have a look at [how we do it in the Perl parser](https://github.com/OpenCageData/perl-Geo-Address-Formatter/blob/master/README.md#installation). |
| 73 | +## International Coverage |
74 | 74 |
|
75 | | -Thanks! |
| 75 | +As of March 2024: |
76 | 76 |
|
77 | | -### International coverage |
| 77 | +| Metric | Count | |
| 78 | +|--------|-------| |
| 79 | +| Known territories | 251 | |
| 80 | +| Territories with tests | 251 (100%) | |
| 81 | +| Territories with rules | 251 (100%) | |
| 82 | +| Territories without rules or tests | 0 (0%) | |
78 | 83 |
|
79 | | -As of March 2024 coverage is: |
| 84 | +This output is generated by `bin/coverage.pl`. Run `bin/coverage.pl -d` for a detailed breakdown. |
80 | 85 |
|
81 | | - We are aware of 251 territories |
82 | | - We have at least one test for 251 (100%) territories |
83 | | - We have rules for 251 (100%) territories |
84 | | - 0 (0%) territories have neither rules nor tests |
85 | | - |
86 | | -This output is generated by `bin/coverage.pl` |
| 86 | +The list of all known territories is in `conf/country_codes.yaml`. |
87 | 87 |
|
88 | | -We need more language specific abbreviations. Please see `conf/abbreviations`. Pull requests gladly received. |
| 88 | +> **Note:** The list contains all officially assigned [ISO 3166-1 alpha-2 codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Officially_assigned_code_elements). This is not a political statement about the status of any territory. |
89 | 89 |
|
90 | | -A detailed breakdown of test and configuration coverage can be found by running `bin/coverage.pl -d`. A list of all known territories is in `conf/country_codes.yaml` |
| 90 | +**We need more language-specific abbreviations.** See `conf/abbreviations`. Pull requests welcome! |
91 | 91 |
|
92 | | -_Please note: the list is simple all officially assigned [ISO 3166-1 alpha-2 codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Officially_assigned_code_elements), and is not a political statement on whether or not these territories are or are not or should or should not be political states._ |
| 92 | +## File Format |
93 | 93 |
|
94 | | -### File format |
| 94 | +- **Configuration:** [YAML](http://yaml.org/) format |
| 95 | +- **Templates:** [Mustache](http://mustache.github.io/) with one variation: `{#first}` sections take the first alternative for which a variable could be interpolated |
95 | 96 |
|
96 | | -The files are in [YAML](http://yaml.org/) format. The templates are written in [Mustache](http://mustache.github.io/) with a minor variation: the `{#first}` sections will take the first alternative for which a variable could be interpolated. Both formats are human readable, strict, solve escaping and support comments. YAML allows references (called "ankers") to avoid copy&paste, Mustache allows sub-templates (called "partials"). |
| 97 | +Both formats are human-readable, strict, handle escaping, and support comments. YAML allows references ("anchors") to avoid duplication; Mustache allows sub-templates ("partials"). |
97 | 98 |
|
98 | | -### How to add your country/territory |
| 99 | +## How to Add Your Country/Territory |
99 | 100 |
|
100 | | -1. edit the .yaml testcase for the country/territory in `testcases/countries`. The file names correspond to the appropriate ISO 3166-1 alpha-2 code - see `conf/country_codes.yaml` |
101 | | - * a good way to get sample data is: |
102 | | - * find an addressed location (house, business, etc) in your |
103 | | - target territory in OpenStreetMap |
104 | | - * get the coordinates (lat, long) of the location |
105 | | - * put the coordinates into the [OpenCage Geocoding API demo page](https://opencagedata.com/demo) |
106 | | - * look at the resulting JSON in the *Raw Response* tab |
| 101 | +### Step 1: Create Test Cases |
107 | 102 |
|
108 | | -2. edit `conf/countries/worldwide.yaml` |
109 | | - * Possibly your country/territory uses an existing generic format as |
110 | | - defined at the top of the file. If so, great! Just map your |
111 | | - country_code to the generic template. You may still want to add |
112 | | - clean up code (see the entry for `DE` as an example). |
113 | | - * If not, you need to define a new rule set (may or may not be generic) |
114 | | - * You may also need to define new state/region mappings in `conf/state_codes.yaml` |
| 103 | +Edit the `.yaml` testcase for the country/territory in `testcases/countries`. File names correspond to [ISO 3166-1 alpha-2 codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) (see `conf/country_codes.yaml`). |
115 | 104 |
|
116 | | -3. to test you will now need to process the .yaml test via a processor |
117 | | - (see above) and ensure the input leads to the desired output. |
118 | | - We also run these checks automatically against pull requests to ensure against regressions. |
| 105 | +**To get sample data:** |
| 106 | +1. Find an addressed location (house, business, etc.) in your target territory on [OpenStreetMap](https://www.openstreetmap.org) |
| 107 | +2. Get the coordinates (lat, long) |
| 108 | +3. Enter the coordinates into the [OpenCage Geocoding API demo](https://opencagedata.com/demo) |
| 109 | +4. Check the resulting JSON in the *Raw Response* tab |
119 | 110 |
|
120 | | -If in doubt, please get in touch by submitting an issue. |
| 111 | +### Step 2: Define Formatting Rules |
121 | 112 |
|
122 | | -### Formatting rules |
| 113 | +Edit `conf/countries/worldwide.yaml`: |
123 | 114 |
|
124 | | -Currently we support the following formatting rules: |
| 115 | +- **If your territory uses an existing generic format** (defined at the top of the file): map your `country_code` to the generic template. You may still want to add cleanup code (see the `DE` entry as an example). |
| 116 | +- **If not**: define a new rule set (which may or may not be generic). You may also need to define new state/region mappings in `conf/state_codes.yaml`. |
125 | 117 |
|
126 | | -* `replace:` regex that operates on the input values, useful for removing bureaucratic cruft like "London Borough of ". Note if you define the regex starting with format _X=_, for example _city=_ it should operate only on values with that key |
127 | | -* `postformat_replace:` regex that operates on the final output |
128 | | -* `add_component:` with a value of the form `component=XXXX` |
129 | | -* `change_country:` change the country value of the input, useful for dependent territories. Can include a substitution like `$state` so that that component value is then inserted into the new country value. See `testcases/countries/sh.yaml` for an example. |
130 | | -* `use_country:` use the formating configuration of another country, useful for dependent territories to avoid duplicating configuration |
| 118 | +### Step 3: Test |
131 | 119 |
|
132 | | -### The future |
| 120 | +Process the `.yaml` test via a processor (see above) and verify the input produces the desired output. We run these checks automatically against pull requests to prevent regressions. |
133 | 121 |
|
134 | | -More tests! For every rule about addresses there are exceptions and edge cases to consider. More test cases are always needed. |
| 122 | +**Questions?** Submit an issue. |
135 | 123 |
|
136 | | -Planned features: |
| 124 | +## Formatting Rules |
137 | 125 |
|
138 | | - * basic error checking, for example ignore things which obviously can not be postcodes |
139 | | - * define rules for postcode format specifically |
| 126 | +| Rule | Description | |
| 127 | +|------|-------------| |
| 128 | +| `replace:` | Regex operating on input values. Useful for removing bureaucratic cruft like "London Borough of". Prefix with `key=` (e.g., `city=`) to operate only on that key. | |
| 129 | +| `postformat_replace:` | Regex operating on the final output. | |
| 130 | +| `add_component:` | Add a component with format `component=XXXX`. | |
| 131 | +| `change_country:` | Change the country value of the input. Useful for dependent territories. Supports substitutions like `$state`. See `testcases/countries/sh.yaml` for an example. | |
| 132 | +| `use_country:` | Use the formatting configuration of another country. Useful for dependent territories to avoid duplicating configuration. | |
140 | 133 |
|
141 | | -We welcome your pull requests. Together we can address the world! |
| 134 | +## Roadmap |
142 | 135 |
|
143 | | -### License |
| 136 | +More tests are always needed. For every rule about addresses there are exceptions and edge cases. |
144 | 137 |
|
145 | | -This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details |
| 138 | +**Planned features:** |
| 139 | +- Basic error checking (e.g., ignore values that obviously cannot be postcodes) |
| 140 | +- Rules for postcode format validation |
146 | 141 |
|
147 | | -### Additional resources |
| 142 | +We welcome your pull requests. Together we can address the world! |
148 | 143 |
|
149 | | -If you are working with addresses you may need [lists of random addresses/postcodes/coordinates](https://opencagedata.com/tools/address-lists) (either in general or for specific countries) for testing. |
| 144 | +## License |
150 | 145 |
|
151 | | -### Further reading on the challenge of address |
| 146 | +MIT License - see [LICENSE.txt](LICENSE.txt) for details. |
152 | 147 |
|
153 | | -Here's [our blog post anouncing this project](https://blog.opencagedata.com/post/99059889253/good-looking-addresses-solving-the-berlin-berlin) and the motivations behind it. |
| 148 | +## Resources |
154 | 149 |
|
155 | | -You may enjoy Michael Tandy's [Falsehoods Programmers Believe about Addresses](http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/). |
| 150 | +### Testing Data |
156 | 151 |
|
157 | | -If it's actual address data you're after, check out [OpenStreetMap](https://www.openstreetmap.org) and [OpenAddresses](http://openaddresses.io/). |
| 152 | +[Lists of random addresses/postcodes/coordinates](https://opencagedata.com/tools/address-lists) for testing (general or country-specific). |
158 | 153 |
|
159 | | -If you want to turn longitude, latitude into well formatted addresses or placenames, well that's what a geocoder does. Check out ours: [OpenCage Geocoder](https://opencagedata.com). |
| 154 | +### Further Reading |
160 | 155 |
|
161 | | -If all this convinces you that address are evil, please check out [what3words](http://what3words.com) which allows you to dispense with them entirely. |
| 156 | +- [Our blog post announcing this project](https://blog.opencagedata.com/post/99059889253/good-looking-addresses-solving-the-berlin-berlin) and the motivations behind it |
| 157 | +- [Falsehoods Programmers Believe about Addresses](http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/) by Michael Tandy |
162 | 158 |
|
163 | | -### Who is OpenCage GmbH? |
| 159 | +### Related Projects |
164 | 160 |
|
165 | | -<a href="https://opencagedata.com"><img src="opencage_logo_300_150.png"></a> |
| 161 | +- [OpenStreetMap](https://www.openstreetmap.org) - Open address data |
| 162 | +- [OpenAddresses](http://openaddresses.io/) - Open address data |
| 163 | +- [OpenCage Geocoder](https://opencagedata.com) - Convert coordinates to formatted addresses |
| 164 | +- [what3words](http://what3words.com) - An alternative to traditional addresses |
166 | 165 |
|
167 | | -We run a worldwide [geocoding API](https://opencagedata.com/api) and [geosearch](https://opencagedata.com/geosearch) service based on open data. |
168 | | -Learn more [about us](https://opencagedata.com/about). |
| 166 | +--- |
169 | 167 |
|
170 | | -We also organize [Geomob](https://thegeomob.com), a series of regular meetups for location based service creators, where we do our best to highlight geoinnovation. If you like geo stuff, you will probably enjoy [the Geomob podcast](https://thegeomob.com/podcast/). |
| 168 | +## About OpenCage GmbH |
171 | 169 |
|
| 170 | +<a href="https://opencagedata.com"><img src="opencage_logo_300_150.png" alt="OpenCage logo"></a> |
172 | 171 |
|
| 172 | +We run a worldwide [geocoding API](https://opencagedata.com/api) and [geosearch](https://opencagedata.com/geosearch) service based on open data. [Learn more about us](https://opencagedata.com/about). |
173 | 173 |
|
| 174 | +We also organize [Geomob](https://thegeomob.com), a series of regular meetups for location-based service creators. If you like geo stuff, check out [the Geomob podcast](https://thegeomob.com/podcast/). |
0 commit comments