Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Written Number Parsing Not as Expected #1236

Open
darakelian opened this issue Sep 18, 2024 · 3 comments
Open

Written Number Parsing Not as Expected #1236

darakelian opened this issue Sep 18, 2024 · 3 comments

Comments

@darakelian
Copy link

Based on the description of the library and the various examples, I had assumed English spelling of words such as "twenty", "thirty", "forty", etc. would be properly parsed by this library. Upon investigating the code, it seems that for English, this library only parses the spelled out numbers 1-12: https://github.com/scrapinghub/dateparser/blob/master/dateparser/data/date_translation_data/en.py#L789 which honestly was a surprise. Is there any way to support parsing something like "in twenty minutes"? Would you guys be open to a PR adding this extra support?

@gutsytechster
Copy link
Collaborator

Hi @darakelian

Could you please provide any example where this doesn't work out? I can see that it works well for me

>>> import dateparser
>>> from datetime import datetime
>>> datetime.now()
datetime.datetime(2024, 10, 27, 2, 10, 38, 594131)
>>> dateparser.parse('in 20 mins')
datetime.datetime(2024, 10, 27, 2, 30, 41, 701795)
>>> dateparser.parse('in 40 mins')
datetime.datetime(2024, 10, 27, 2, 50, 47, 316997)

@darakelian
Copy link
Author

Hi, as I mentioned in the text I specifically am seeing issues with the spelled out versions (i.e. "twenty" instead of "20") as can be seen here:

>>> import dateparser
>>> from datetime import datetime
>>> datetime.now()
datetime.datetime(2024, 10, 26, 20, 12, 53, 898538)
>>> dateparser.parse("in 20 minutes")
datetime.datetime(2024, 10, 26, 20, 33, 3, 641682)
>>> dateparser.parse("in twenty minutes")
>>> 

@terkalma
Copy link

terkalma commented Dec 26, 2024

yes, indeed the locale could be improved, but you can also customize simplifications as a workaround to account for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants