You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do all the toponyms exist in OSM (city, state, region names, etc.)?
yes
If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result?
no
Here's what I think could be improved
Is it possible to specify that French postcodes are of the form (\d[0-9aAbB]\d{3}) when parsing?
The codes '2A' and '2B' correspond to the two Corsican departments in France. Openstreet map treats them as '20' but this is not the reality.
Is it possible to set libpostal to recognise this form of regex ?
The text was updated successfully, but these errors were encountered:
yes guessing that postcode format doesn't exist in the training data (you can type .print_features in the address_parser cli and then try an address to see what the model is doing and where it might get stuck). Libpostal is not based on regex, other than to split strings into words. Using 20250 works for instance because it is a common postcode format and we also have some geographic context dictionaries which help identify postal codes from known geographic contexts (which probably include the 20250 version as well).
You can use a regex to extract/remove postcodes following that pattern and reparse the remainder, e.g. something like this will usually also work. If you're sending to Elasticsearch, you can just add the extracted postcode back in if needed for ElasticSearch purposes (postcode may be more selective than city, etc).
1 rue saint roch poggio-di-venaco
{
"house_number": "1",
"road": "rue saint roch",
"city": "poggio-di-venaco"
}
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is France
Here's how I'm using libpostal
We use libpostal to parse addresses before searching with elasticsearch.
Here's what I did
parse_address('1 rue saint roch 2B238 poggio-di-venaco',language = 'fr', country = 'fr')
Here's what I got
[('1', 'house_number'),
('rue saint roch 2b238', 'road'),
('poggio-di-venaco', 'city')]
Here's what I was expecting
[('1', 'house_number'),
('rue saint roch', 'road'),
('2b238','postcode'),
('poggio-di-venaco', 'city')]
For parsing issues, please answer "yes" or "no" to all that apply.
no
yes
no
Here's what I think could be improved
Is it possible to specify that French postcodes are of the form (\d[0-9aAbB]\d{3}) when parsing?
The codes '2A' and '2B' correspond to the two Corsican departments in France. Openstreet map treats them as '20' but this is not the reality.
Is it possible to set libpostal to recognise this form of regex ?
The text was updated successfully, but these errors were encountered: