-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📚 Inaccurate pre-trained model predictions master thread #3052
Comments
@Woodchucks: We also noticed this, and it appears to be a problem related to the whitespace augmentation in the training settings for a tagger that's trained on its own rather than with a shared To be honest the behavior is pretty bizarre and surprising. It doesn't show up (at least not enough to lead to much lower The upcoming v3.5.0 trained pipelines for Polish should improve this by adding |
@adrianeboyd Thank you for the fast reply. I didn't notice your respond so I've deleted my comment and published it again as issue #12002. Sorry for the inconvenience. Glad to hear that the new version will have the IS_SPACE feature implemented. |
Hi, I encountered an issue where in German the token "20-Plus" is wrongly tagged as "SPACE", which could hint towards a data issue: |
This is related to the same underlying issue as #12002, where data augmentation involving whitespace seems to sometimes lead to unknown words being tagged as Maybe we should just add |
Hello ! Following the answer I got in this discussion, I'm reposting my issue on this master thread. Here are 2 examples with different versions of the model done in a Linux environment with python 3.10. spacy-transformers == 1.2.0
spacy == 3.5.0
fr_dep_news_trf == 3.5.0 > doc = nlp("Je vais skier dans les Alpes de France cet hiver.")
> [(i.lemma_, i.pos_) for i in doc if i.text == "Alpes"]
[('Alpes', 'PROPN')]
> doc = nlp("Je vais skier dans les Alpes de France cet hiver. " *10)
> [(i.lemma_, i.pos_) for i in doc if i.text == "Alpes"]
[('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN')] With another version, there is far less wrong predictions but still some at some point. spacy-transformers == 1.1.9
spacy == 3.4.4
fr_dep_news_trf == 3.4.0 > doc = nlp("Je vais skier dans les Alpes de France cet hiver.")
> [(i.lemma_, i.pos_) for i in doc if i.text == "Alpes"]
[('Alpes', 'PROPN')]
> doc = nlp("Je vais skier dans les Alpes de France cet hiver. " *10)
> [(i.lemma_, i.pos_) for i in doc if i.text == "Alpes"]
[('alpe', 'NOUN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('alpe', 'NOUN')] I'd like to know if it is expected from the model or not. Like, is this just because I don't give it enough context or something else. Thank you for your help! |
Spacy's English named entity recognition has issues with apostrophes.
This returns [ORG] for Megan insetad of [PERSON]. Similar issues occur with, for example, the word "Applebee's". |
Thanks for reporting this, @postnubilaphoebus. The small model being doesn't do that well with names not occuring often enough in the training data. I recommend giving |
Hi! We've spotted some NSUBJ/DOBJ mixups with parsing sentences using en_core_web_trf (3.5) that start with Make: For example:
Outputs:
There should not be an nsubj in this sentence. 'Make(ROOT) the(det) compression(dobj) used(acl) between(prep) map(nmod) reduce(compound) tasks(pobj) configurable(ccomp) .(punct)' Other examples include:
All of these put an nsubj where there should be a dobj. Note, I tested 3.3.4, and 3.4.4 and they seemed to do the same thing |
Imperatives and questions are two very common things that most of our trained pipelines perform poorly on because they are rare in typical newspaper training data. |
Hi @cbowdon, OntoNotes does contain NER annotation, see: https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf |
@adrianeboyd Thank you! |
Hi there! I've come across an anomaly in the parsing component of the 'en_core_web_sm' model. Specifically, I've noticed that the verb 'need' is sometimes labeled as the root of the sentence, while in other cases, it's labeled as an 'aux'. Even more strangely, when the same sentence is repeated twice or more, the behavior of the parsing component becomes erratic. Take this example: "the member states need not do something. the member states need not do something." In the first sentence, the subject is a "child" of the root verb 'do', while in the second sentence (which is identical!), the subject is the child of the 'aux'. I've tried to replicate this behavior with other examples, but the anomaly is not always present. I'd appreciate any insights or suggestions on whether you think this could arise in other circumstances as well. Thanks! |
Hi @giova-p, yes, the predictions of the statistical models depend on a context window that can go beyond a single sentence, so you will see differences like this in practice. A pipeline should output the same predictions for the exact same input text string every time, but if anything is modified in the text, even adding whitespace, you may see different predictions. |
I have identified a discrepancy in the entities detected by the "en_ner_bc5cdr_md-0.5.1" model between results obtained from a Windows system and an Ubuntu system. According to the readme file of the "en_ner_bc5cdr_md-0.5.1" model, it is trained up to Spacy version 3.5.0. Interestingly, this alignment holds true for the Windows system. Whenever I adjust the Spacy version to a value above 3.5.0, the named entity recognition (NER) results are no longer produced. The model en_ner_bc5cdr_md-0.5.0 worked irrespective of the spacy version. However, an interesting scenario emerged when I conducted the same experiment on an Ubuntu system. Here, the "en_ner_bc5cdr_md-0.5.1" model generated NER outputs regardless of the Spacy version I employed. I even tested it with versions like 3.6.1 and even lower than that. This leads me to the question: Why is this discrepancy in behavior occurring between the Windows and Ubuntu systems? Is this a known issue? Am I missing something?? |
Hi @Arjuman23, If I understand you correctly, both From the release notes, I gather that the In general, you can run |
Hi @svlandeg, |
You could contact them through their issue tracker, but to be honest I'm not sure there's a bug to be solved here. The expected behaviour is that the models work within their range, and not outside of it. It might accidentally do work on some systems outside of the "correct" spaCy range, for various reasons I'm not sure of. Again, you can ask them / report this to them, but I don't think there's something to be fixed here (I agree it's weird behaviour though). |
I'm not sure if this counts as a pre-trained model prediction given that the tokenizer is rule-based, but it looks like spaCy's English tokenizer splits the verb "wed". See below: If this isn't a mistake, I can imagine it might be a way to deal with common typos of edit: the same thing happens with the noun |
|
Hey, here are some inaccurate parses I encountered (all using spacy version 3.7.2):
|
The following Portuguese sentences, which all have a verb capitalized to start the sentence, result in an incorrect lemma for the verb (pt_core_news_lg, spacy 3.7.2)
In each case, the lemma of the first word is given as the word unchanged. If the first word is lower cased, the correct lemmas are produced ( |
The Portuguese word |
In all the Spanish models I’ve tried, from small to large, the lemma of |
Common
It is a systematic error, Something similar happens with |
This thread is a master thread for collecting problems and reports related to incorrect and/or problematic predictions of the pre-trained models.
Why a master thread instead of separate issues?
GitHub now supports pinned issues, which lets us create master threads more easily without them getting buried.
Users often report issues that come down to incorrect predictions made by the pre-trained statistical models. Those are all good and valid, and can include very useful test cases. However, having a lot of open issues around minor incorrect predictions across various languages also makes it more difficult to keep track of the reports. Unlike bug reports, they're much more difficult to action on. Sometimes, mistakes a model makes can indicate deeper problems that occurred during training or when preprocessing the data. Sometimes they can give us ideas for how to use data augmentation to make the models less sensitive to very small variations like punctuation or capitalisation.
Other times, it's just something we have to accept. A model that's 90% accurate will make a mistake on every 10th prediction. A model that's 99% accurate will be wrong once every 100 predictions.
The main reason we distribute pre-trained models is that it makes it easier for users to build their own systems by fine-tuning pre-trained models on their data. Of course, we want them to be as good as possible, and we're always optimising for the best compromise of speed, size and accuracy. But we won't be able to ship pre-trained models that are always correct on all data ever.
For many languages, we're also limited by the resources available, especially when it comes to data for named entity recognition. We've already made substantial investments into licensing training corpora, and we'll continue doing so (including running our own annotation projects with Prodigy ✨) – but this will take some time.
Reporting incorrect predictions in this thread
If you've come across suspicious predictions in the pre-trained models (tagger, parser, entity recognizer) or you want to contribute test cases for a given language, feel free to submit them here. (Test cases should be "fair" and useful for measuring the model's general accuracy, so single words, significant typos and very ambiguous parses aren't usually that helpful.)
You can check out our new models test suite for spaCy
v2.1.0
to see the tests we're currently running.The text was updated successfully, but these errors were encountered: