-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some quirks when parsing a general text... #135
Comments
There are some legitimate problems with the parse output of this text, thanks for the sample! I will have a look into certain issues.
Disambiguation is not perfect yet as shown by the "minute of arc" interpretation. Still working on improving this... |
In passing, I also just spotted this natural language time parsing package — ctparse — but I've not had a chance to play with it yet. |
I had several similar issues. The weirdest being 'PayPal' being parsed into 'petayear year petayear litre'. Is there a way to force quantulum to just basic units and not try to guess these combinations? Or any way to change its behavior to adapt it to my situation. |
I agree, a parameter to disable parsing non-space-seperated combined units should be passed. Also maybe passing a list of custom (application specific) words that are not be interpreted as units. |
I'll see whether I can find the time to do it. On another note: the only way to add custom units is to edit the entities.json or units.json files? Or is there a way to do it from python? |
Currently this is the easiest way without changing the source code of the project. |
@alberto-bracci with #186 there will be an option to add custom entities and units to quantulum3 without any hassle :) sorry for the delay but this required some reworking of inner quantulum structure that was pending anyways |
I wrote a simple story and it threw up some interesting numbers...
returns:
and
parser.parse(text)
returns:The text was updated successfully, but these errors were encountered: