📚 Inaccurate pre-trained model predictions master thread #3052

ines · 2018-12-14T11:11:47Z

This thread is a master thread for collecting problems and reports related to incorrect and/or problematic predictions of the pre-trained models.

Why a master thread instead of separate issues?

GitHub now supports pinned issues, which lets us create master threads more easily without them getting buried.

Users often report issues that come down to incorrect predictions made by the pre-trained statistical models. Those are all good and valid, and can include very useful test cases. However, having a lot of open issues around minor incorrect predictions across various languages also makes it more difficult to keep track of the reports. Unlike bug reports, they're much more difficult to action on. Sometimes, mistakes a model makes can indicate deeper problems that occurred during training or when preprocessing the data. Sometimes they can give us ideas for how to use data augmentation to make the models less sensitive to very small variations like punctuation or capitalisation.

Other times, it's just something we have to accept. A model that's 90% accurate will make a mistake on every 10th prediction. A model that's 99% accurate will be wrong once every 100 predictions.

The main reason we distribute pre-trained models is that it makes it easier for users to build their own systems by fine-tuning pre-trained models on their data. Of course, we want them to be as good as possible, and we're always optimising for the best compromise of speed, size and accuracy. But we won't be able to ship pre-trained models that are always correct on all data ever.

For many languages, we're also limited by the resources available, especially when it comes to data for named entity recognition. We've already made substantial investments into licensing training corpora, and we'll continue doing so (including running our own annotation projects with Prodigy ✨) – but this will take some time.

Reporting incorrect predictions in this thread

If you've come across suspicious predictions in the pre-trained models (tagger, parser, entity recognizer) or you want to contribute test cases for a given language, feel free to submit them here. (Test cases should be "fair" and useful for measuring the model's general accuracy, so single words, significant typos and very ambiguous parses aren't usually that helpful.)

You can check out our new models test suite for spaCy v2.1.0 to see the tests we're currently running.

The text was updated successfully, but these errors were encountered:

adrianeboyd · 2022-12-20T12:34:45Z

@Woodchucks: We also noticed this, and it appears to be a problem related to the whitespace augmentation in the training settings for a tagger that's trained on its own rather than with a shared tok2vec, where Polish is the only language in the provided trained pipelines with a completely independent tagger component.

To be honest the behavior is pretty bizarre and surprising. It doesn't show up (at least not enough to lead to much lower TAG scores) in evaluations of the dev data, which might be due to fewer unseen tokens in the dev data from the same corpus, and it's still possible there's an underlying bug. We haven't noticed this for other languages, so it seems like training a tagger with a shared tok2vec (with a morphologizer, lemmatizer, and/or parser) prevents the model from predicting that unseen tokens might be _SP, but in this case, the tagger on its own seems to lump whitespace tokens and unseen tokens into the same category.

The upcoming v3.5.0 trained pipelines for Polish should improve this by adding IS_SPACE as a feature so that the model has enough information to differentiate whitespace tokens from other tokens.

Woodchucks · 2022-12-20T13:06:07Z

@adrianeboyd Thank you for the fast reply. I didn't notice your respond so I've deleted my comment and published it again as issue #12002. Sorry for the inconvenience. Glad to hear that the new version will have the IS_SPACE feature implemented.

stefan-veezoo · 2022-12-28T15:15:44Z

Hi, I encountered an issue where in German the token "20-Plus" is wrongly tagged as "SPACE", which could hint towards a data issue:

https://demos.explosion.ai/displacy?text=Kunden%20mit%20dem%20Produkt%2020-Plus&model=de_core_news_sm&cpu=1&cph=1

adrianeboyd · 2023-01-09T11:52:38Z

This is related to the same underlying issue as #12002, where data augmentation involving whitespace seems to sometimes lead to unknown words being tagged as SPACE.

Maybe we should just add IS_SPACE to all the models now and consider updating SHAPE to normalize spaces in v4 so that we can drop IS_SPACE, since there's a slight speed hit.

probavee · 2023-02-07T16:08:29Z

Hello ! Following the answer I got in this discussion, I'm reposting my issue on this master thread.
I'm using the french transformer model fr_dep_news_trf.
When processing this sentence "Je vais skier dans les Alpes de France cet hiver." The model predicts accurately that "Alpes" is a PROPN.
But when I duplicate this sentence like "Je vais skier dans les Alpes de France cet hiver. Je vais skier dans les Alpes de France cet hiver." It now tags "Alpes" as a NOUN.

Here are 2 examples with different versions of the model done in a Linux environment with python 3.10.

spacy-transformers == 1.2.0
spacy == 3.5.0
fr_dep_news_trf == 3.5.0

> doc = nlp("Je vais skier dans les Alpes de France cet hiver.")
> [(i.lemma_, i.pos_) for i in doc if i.text == "Alpes"]

[('Alpes', 'PROPN')]

> doc = nlp("Je vais skier dans les Alpes de France cet hiver. " *10)
> [(i.lemma_, i.pos_) for i in doc if i.text == "Alpes"]

[('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN'), ('alpe', 'NOUN')]

With another version, there is far less wrong predictions but still some at some point.

spacy-transformers == 1.1.9
spacy == 3.4.4
fr_dep_news_trf == 3.4.0

> doc = nlp("Je vais skier dans les Alpes de France cet hiver.")
> [(i.lemma_, i.pos_) for i in doc if i.text == "Alpes"]

[('Alpes', 'PROPN')]

> doc = nlp("Je vais skier dans les Alpes de France cet hiver. " *10)
> [(i.lemma_, i.pos_) for i in doc if i.text == "Alpes"]

[('alpe', 'NOUN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('Alpes', 'PROPN'), ('alpe', 'NOUN')]

I'd like to know if it is expected from the model or not. Like, is this just because I don't give it enough context or something else.
The word France in the sentences is always well tagged.
It seems that there is always a threshold of tokens where the predictions get wrong.

Thank you for your help!

postnubilaphoebus · 2023-02-09T09:55:32Z

Spacy's English named entity recognition has issues with apostrophes.
Using Spacy 3.5.0, please try the following code:

import spacy
nlp = spacy.load("en_core_web_sm", disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"])
doc = nlp("That had been Megan's plan when she got him dressed earlier.")
labels = [ent.label_ for ent in doc.ents]
entity_text = [ent.text for ent in doc.ents]
print(labels) 
print(entity_text)

This returns [ORG] for Megan insetad of [PERSON]. Similar issues occur with, for example, the word "Applebee's".

rmitsch · 2023-02-09T12:20:26Z

Thanks for reporting this, @postnubilaphoebus. The small model being doesn't do that well with names not occuring often enough in the training data. I recommend giving en_core_web_md a shot (it's inferring the correct entity label in your example).

stestagg · 2023-03-21T17:44:57Z

Hi!

We've spotted some NSUBJ/DOBJ mixups with parsing sentences using en_core_web_trf (3.5) that start with Make:

For example:

import spacy
print(f'Spacy={spacy.__version__}')
en = spacy.load('en_core_web_trf')
print(f'Lang={en.path.name}')
sent = en('Make the compression used between map reduce tasks configurable.')
' '.join([f'{t}({t.dep_})' for t in sent])

Outputs:

Spacy=3.5.0
Lang=en_core_web_trf-3.5.0

'Make(ROOT) the(det) compression(nsubj) used(acl) between(prep) map(nmod) reduce(compound) tasks(pobj) configurable(ccomp) .(punct)'

There should not be an nsubj in this sentence.
This should be:

'Make(ROOT) the(det) compression(dobj) used(acl) between(prep) map(nmod) reduce(compound) tasks(pobj) configurable(ccomp) .(punct)'

Other examples include:

Make the output of the reduce side plan optimized by the correlation optimizer more reader-friendly.
Make ZooKeeper easier to test - support simulating a connection loss
Make compaction more robust when stats update fails
...

All of these put an nsubj where there should be a dobj.

Note, I tested 3.3.4, and 3.4.4 and they seemed to do the same thing

adrianeboyd · 2023-03-22T15:30:49Z

Imperatives and questions are two very common things that most of our trained pipelines perform poorly on because they are rare in typical newspaper training data.

cbowdon · 2023-05-05T08:28:13Z

What is the NER training data for English please? I see some models (e.g. German) are trained on WikiNER but none of the referenced sources for English models (e.g. here) are related to NER.

Apologies if this is the wrong place to ask, I was drawn here from other related issues.

adrianeboyd · 2023-05-05T09:22:04Z

Hi @cbowdon, OntoNotes does contain NER annotation, see: https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf

cbowdon · 2023-05-05T10:40:03Z

@adrianeboyd Thank you!

giova-p · 2023-07-28T08:19:12Z

Hi there!

I've come across an anomaly in the parsing component of the 'en_core_web_sm' model. Specifically, I've noticed that the verb 'need' is sometimes labeled as the root of the sentence, while in other cases, it's labeled as an 'aux'.

Even more strangely, when the same sentence is repeated twice or more, the behavior of the parsing component becomes erratic. Take this example: "the member states need not do something. the member states need not do something." In the first sentence, the subject is a "child" of the root verb 'do', while in the second sentence (which is identical!), the subject is the child of the 'aux'.

I've tried to replicate this behavior with other examples, but the anomaly is not always present. I'd appreciate any insights or suggestions on whether you think this could arise in other circumstances as well.

Thanks!
atb
g.

adrianeboyd · 2023-07-31T06:45:46Z

Hi @giova-p, yes, the predictions of the statistical models depend on a context window that can go beyond a single sentence, so you will see differences like this in practice.

A pipeline should output the same predictions for the exact same input text string every time, but if anything is modified in the text, even adding whitespace, you may see different predictions.

Arjuman23 · 2023-08-20T08:11:24Z

I have identified a discrepancy in the entities detected by the "en_ner_bc5cdr_md-0.5.1" model between results obtained from a Windows system and an Ubuntu system. According to the readme file of the "en_ner_bc5cdr_md-0.5.1" model, it is trained up to Spacy version 3.5.0. Interestingly, this alignment holds true for the Windows system. Whenever I adjust the Spacy version to a value above 3.5.0, the named entity recognition (NER) results are no longer produced. The model en_ner_bc5cdr_md-0.5.0 worked irrespective of the spacy version.

However, an interesting scenario emerged when I conducted the same experiment on an Ubuntu system. Here, the "en_ner_bc5cdr_md-0.5.1" model generated NER outputs regardless of the Spacy version I employed. I even tested it with versions like 3.6.1 and even lower than that.

This leads me to the question: Why is this discrepancy in behavior occurring between the Windows and Ubuntu systems? Is this a known issue? Am I missing something??

svlandeg · 2023-08-21T11:57:18Z

Hi @Arjuman23,

If I understand you correctly, both en_ner_bc5cdr_md-0.5.0 and en_ner_bc5cdr_md-0.5.1 work fine on Ubuntu & Windows within the spaCy ranges specified for these model, right?

From the release notes, I gather that the 0.5.0 models were trained with 3.2.3 and the 0.5.1 models with 3.4.x. Note that we don't actually train or maintain these models - AllenAI does.

In general, you can run python -m spacy validate to double check whether a model in your environment is compatible with the spaCy version. If it's not, I'm afraid we can't really make any guarantees about its behaviour.

https://github.com/allenai/scispacy/issues

Arjuman23 · 2023-08-21T12:10:55Z

Hi @svlandeg,
Thank you for your response. Much appreciated.
You've pointed it out right, both the models work fine within te spacy ranges specified in their readme files, but on windows. On Ubuntu, they work on the latest spacy versions as well, without any hassle (eg .3.6.1)
I totally agree that AllenAI maintains them, but I didn't know how to report this to them. Hence I came down to its roots :P
If you can connect me to them, it would be helpful.

svlandeg · 2023-08-21T12:27:01Z

You could contact them through their issue tracker, but to be honest I'm not sure there's a bug to be solved here. The expected behaviour is that the models work within their range, and not outside of it. It might accidentally do work on some systems outside of the "correct" spaCy range, for various reasons I'm not sure of. Again, you can ask them / report this to them, but I don't think there's something to be fixed here (I agree it's weird behaviour though).

Mindful · 2023-10-29T04:40:46Z

I'm not sure if this counts as a pre-trained model prediction given that the tokenizer is rule-based, but it looks like spaCy's English tokenizer splits the verb "wed". See below:
https://demos.explosion.ai/displacy?text=The%20couple%20was%20wed%20yesterday.&model=en_core_web_sm&cpu=1&cph=1

If this isn't a mistake, I can imagine it might be a way to deal with common typos of we'd as wed, but it's a little inconvenient.

edit: the same thing happens with the noun cant. I'm not sure if there's a good way to fix this, it seems like you would need POS or syntax information to make judgements about whether something was likely to be a typo or not.

rafa852 · 2023-11-01T19:45:14Z

This thread is a master thread for collecting problems and reports related to incorrect and/or problematic predictions of the pre-trained models.

Why a master thread instead of separate issues?

GitHub now supports pinned issues, which lets us create master threads more easily without them getting buried.

Users often report issues that come down to incorrect predictions made by the pre-trained statistical models. Those are all good and valid, and can include very useful test cases. However, having a lot of open issues around minor incorrect predictions across various languages also makes it more difficult to keep track of the reports. Unlike bug reports, they're much more difficult to action on. Sometimes, mistakes a model makes can indicate deeper problems that occurred during training or when preprocessing the data. Sometimes they can give us ideas for how to use data augmentation to make the models less sensitive to very small variations like punctuation or capitalisation.

Other times, it's just something we have to accept. A model that's 90% accurate will make a mistake on every 10th prediction. A model that's 99% accurate will be wrong once every 100 predictions.

The main reason we distribute pre-trained models is that it makes it easier for users to build their own systems by fine-tuning pre-trained models on their data. Of course, we want them to be as good as possible, and we're always optimising for the best compromise of speed, size and accuracy. But we won't be able to ship pre-trained models that are always correct on all data ever.

For many languages, we're also limited by the resources available, especially when it comes to data for named entity recognition. We've already made substantial investments into licensing training corpora, and we'll continue doing so (including running our own annotation projects with Prodigy ✨) – but this will take some time.

Reporting incorrect predictions in this thread

If you've come across suspicious predictions in the pre-trained models (tagger, parser, entity recognizer) or you want to contribute test cases for a given language, feel free to submit them here. (Test cases should be "fair" and useful for measuring the model's general accuracy, so single words, significant typos and very ambiguous parses aren't usually that helpful.)

You can check out our new models test suite for spaCy v2.1.0 to see the tests we're currently running.

cyriaka90 · 2023-12-21T15:49:54Z

Hey, here are some inaccurate parses I encountered (all using spacy version 3.7.2):

Portuguese (pt_core_news_sm):
- 1. doc = nlp("Reserve voos baratos.")
    print(doc.to_json())
    
    {'text': 'Reserve voos baratos.', 'ents': [{'start': 0, 'end': 7, 'label': 'LOC'}], 'sents': [{'start': 0, 'end': 21}], 'tokens': [{'id': 0, 'start': 0, 'end': 7, 'tag': 'PROPN', 'pos': 'PROPN', 'morph': 'Gender=Fem|Number=Sing', 'lemma': 'Reserve', 'dep': 'ROOT', 'head': 0}, {'id': 1, 'start': 8, 'end': 12, 'tag': 'NOUN', 'pos': 'NOUN', 'morph': 'Gender=Fem|Number=Plur', 'lemma': 'voos', 'dep': 'nsubj', 'head': 0}, {'id': 2, 'start': 13, 'end': 20, 'tag': 'ADJ', 'pos': 'ADJ', 'morph': 'Gender=Masc|Number=Plur', 'lemma': 'barato', 'dep': 'amod', 'head': 1}, {'id': 3, 'start': 20, 'end': 21, 'tag': 'PUNCT', 'pos': 'PUNCT', 'morph': '', 'lemma': '.', 'dep': 'punct', 'head': 0}]}
    
    The verb Reserve is parsed as PROPN and the lemma for voos is given as voos, but should be voo.
- 1. doc = nlp("..., e a maioria das novidades já foram reveladas através de fotos vazadas.")
    
    ....{{'id': 14, 'start': 69, 'end': 74, 'tag': 'AUX', 'pos': 'AUX', 'morph': 'Mood=Ind|Number=Plur|Person=3|VerbForm=Fin', 'lemma': 'ser', 'dep': 'aux:pass', 'head': 15},...
    
    The verb foram should be parsed as past tense.
Greek (el_core_news_sm):
doc = nlp(" Αφήστε τον εαυτό σας να εκπλαγείτε από τις συναρπαστικές δυνατότητες!")

{'text': 'Αφήστε τον εαυτό σας να εκπλαγείτε από τις συναρπαστικές δυνατότητες!', 'ents': [], 'sents': [{'start': 0, 'end': 69}], 'tokens': [{'id': 0, 'start': 0, 'end': 6, 'tag': 'VERB', 'pos': 'VERB', 'morph': 'Aspect=Imp|Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin|Voice=Pass', 'lemma': 'Αφήστε', 'dep': 'ROOT', 'head': 0}, ....

The verb Αφήστε is parsed with mood Ind instead of Imp, also the lemma should be αφήνω.
English (en_core_web_sm):
- 1. doc = nlp("90 % of Australians like him, the most of any country.")
    
    {'text': '90 % of Australians like him, the most of any country.', 'ents': [{'start': 0, 'end': 4, 'label': 'PERCENT'}, {'start': 8, 'end': 19, 'label': 'NORP'}], 'sents': [{'start': 0, 'end': 54}], 'tokens': [{'id': 0, 'start': 0, 'end': 2, 'tag': 'CD', 'pos': 'NUM', 'morph': 'NumType=Card', 'lemma': '90', 'dep': 'nummod', 'head': 1}, {'id': 1, 'start': 3, 'end': 4, 'tag': 'NN', 'pos': 'NOUN', 'morph': 'Number=Sing', 'lemma': '%', 'dep': 'ROOT', 'head': 1}, {'id': 2, 'start': 5, 'end': 7, 'tag': 'IN', 'pos': 'ADP', 'morph': '', 'lemma': 'of', 'dep': 'prep', 'head': 1}, {'id': 3, 'start': 8, 'end': 19, 'tag': 'NNPS', 'pos': 'PROPN', 'morph': 'Number=Plur', 'lemma': 'Australians', 'dep': 'pobj', 'head': 2}, ....
    
    Lemma for Australians should be Australian.
- 1. doc = nlp("Then, as if to show that he could, he collapsed.")
    
    ...{ {'id': 8, 'start': 28, 'end': 33, 'tag': 'MD', 'pos': 'AUX', 'morph': 'VerbForm=Fin', 'lemma': 'could', 'dep': 'ccomp', 'head': 5}, ....
    
    Lemma for verb could should be can with tense=past.
German (de_core_news_sm):
- 1. doc = nlp("Ein Reifen, der sich für längere Strecken genauso gut eignet wie für den Alltag.")
    
    ... {'id': 10, 'start': 54, 'end': 60, 'tag': 'VVPP', 'pos': 'VERB', 'morph': 'VerbForm=Part', 'lemma': 'eignen', 'dep': 'rc', 'head': 1}, {'id': 11, 'start': 61, 'end': 64, 'tag': 'KOKOM', 'pos': 'ADP', 'morph': '', 'lemma': 'wie', 'dep': 'cm', 'head': 12}, {'id': 12, 'start': 65, 'end': 68, 'tag': 'APPR', 'pos': 'ADP', 'morph': '', 'lemma': 'für', 'dep': 'cc', 'head': 5}, {'id': 13, 'start': 69, 'end': 72, 'tag': 'ART', 'pos': 'DET', 'morph': 'Case=Acc|Definite=Def|Gender=Masc|Number=Sing|PronType=Art', 'lemma': 'der', 'dep': 'nk', 'head': 14}, {'id': 14, 'start': 73, 'end': 79, 'tag': 'NN', 'pos': 'NOUN', 'morph': 'Case=Acc|Gender=Masc|Number=Sing', 'lemma': 'Alltag', 'dep': 'nk', 'head': 12}, {'id': 15, 'start': 79, 'end': 80, 'tag': '$.', 'pos': 'PUNCT', 'morph': '', 'lemma': '--', 'dep': 'punct', 'head': 1}]}
    
    Verb eignet should be present tense and not participle.
- 1. doc = nlp("Das Epad lässt sich problemlos zum Picknick mitnehmen.")
    
    {'text': 'Das Epad lässt sich problemlos zum Picknick mitnehmen.', 'ents': [], 'sents': [{'start': 0, 'end': 54}], 'tokens': [{'id': 0, 'start': 0, 'end': 3, 'tag': 'ART', 'pos': 'DET', 'morph': 'Case=Nom|Definite=Def|Gender=Neut|Number=Sing|PronType=Art', 'lemma': 'der', 'dep': 'nk', 'head': 1}, {'id': 1, 'start': 4, 'end': 8, 'tag': 'NN', 'pos': 'NOUN', 'morph': 'Case=Nom|Gender=Neut|Number=Sing', 'lemma': 'Epad', 'dep': 'sb', 'head': 2}, {'id': 2, 'start': 9, 'end': 14, 'tag': 'VVFIN', 'pos': 'VERB', 'morph': 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin', 'lemma': 'lässn', 'dep': 'ROOT', 'head': 2}, ...
    
    The lemma for verb lässt should be lassen, not lässn.
- 1. doc = nlp("So findet auch der stressigste Tag einen leckeren und entspannten Abschluss.")
    
    ....{'id': 7, 'start': 41, 'end': 49, 'tag': 'ADJA', 'pos': 'ADJ', 'morph': 'Case=Acc|Degree=Cmp|Gender=Masc|Number=Sing', 'lemma': 'leck', 'dep': 'nk', 'head': 10}, {'id': 8, 'start': 50, 'end': 53, 'tag': 'KON', 'pos': 'CCONJ', 'morph': '', 'lemma': 'und', 'dep': 'cd', 'head': 7}, {'id': 9, 'start': 54, 'end': 65, 'tag': 'ADJA', 'pos': 'ADJ', 'morph': 'Case=Acc|Degree=Pos|Gender=Masc|Number=Sing', 'lemma': 'entspannt', 'dep': 'cj', 'head': 8}, {'id': 10, 'start': 66, 'end': 75, 'tag': 'NN', 'pos': 'NOUN', 'morph': 'Case=Dat|Gender=Masc|Number=Plur', 'lemma': 'Abschluss', 'dep': 'oa', 'head': 1}, {'id': 11, 'start': 75, 'end': 76, 'tag': '$.', 'pos': 'PUNCT', 'morph': '', 'lemma': '--', 'dep': 'punct', 'head': 1}]}
    
    The lemma for adjective leckeren should be lecker, not leck.
Croatian (hr_core_news_sm):
doc = nlp("Kupi jabuku i knjigu.")

{'text': 'Kupi jabuku i knjigu.', 'ents': [], 'sents': [{'start': 0, 'end': 21}], 'tokens': [{'id': 0, 'start': 0, 'end': 4, 'tag': 'Vmr3s', 'pos': 'VERB', 'morph': 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin', 'lemma': 'Kupi', 'dep': 'ROOT', 'head': 0},...

Lemma for verb kupi should be kupiti.
Italian (it_core_news_sm):
`doc = nlp("Prenota voli economici.")``

{'text': 'Prenota voli economici.', 'ents': [{'start': 0, 'end': 7, 'label': 'MISC'}], 'sents': [{'start': 0, 'end': 23}], 'tokens': [{'id': 0, 'start': 0, 'end': 7, 'tag': 'S', 'pos': 'NOUN', 'morph': 'Gender=Fem|Number=Sing', 'lemma': 'prenota', 'dep': 'nmod', 'head': 1}, {'id': 1, 'start': 8, 'end': 12, 'tag': 'S', 'pos': 'NOUN', 'morph': 'Gender=Masc|Number=Plur', 'lemma': 'volo', 'dep': 'ROOT', 'head': 1}, {'id': 2, 'start': 13, 'end': 22, 'tag': 'A', 'pos': 'ADJ', 'morph': 'Gender=Masc|Number=Plur', 'lemma': 'economico', 'dep': 'amod', 'head': 1}, {'id': 3, 'start': 22, 'end': 23, 'tag': 'FS', 'pos': 'PUNCT', 'morph': '', 'lemma': '.', 'dep': 'punct', 'head': 1}]}
```
The verb `Prenota` is parsed as `NOUN` instead of VERB.
```

glangford · 2024-01-15T20:27:13Z

The following Portuguese sentences, which all have a verb capitalized to start the sentence, result in an incorrect lemma for the verb (pt_core_news_lg, spacy 3.7.2)

"Trabalharam com honra e dignidade e estiveram entre os melhores."
"Fale só um bocadinho sobre o Festival."
"Surge detrás das cortinas."
"Encontrei as chaves."
"Reserve voos baratos." (this one is from the earlier comment #3052 (comment))

In each case, the lemma of the first word is given as the word unchanged.

If the first word is lower cased, the correct lemmas are produced (trabalhar, falar, surgir, encontrar, reservar).

joprice · 2024-08-21T23:20:13Z

The Portuguese word compartilharemos produces the lemma compartilharemo in the sentence Nós compartilharemos. when starting with a capital letter and compartilharer when the initial letter is lowercased, instead of compartilhar.

jomra · 2024-09-13T04:55:02Z

In all the Spanish models I’ve tried, from small to large, the lemma of tendientes is resolved as tendient, which isn’t actually a word (it should be the singular form tendiente. tendiente resolves correctly to tendiente. All cases I’ve tried have the word in lower case, though at various locations in the original string

ivan-kleshnin · 2024-12-27T10:24:58Z

Common Ex- prefix interpretation is completely broken (tested with multiple examples):

text: 'A director.'

[{'dep': 'det', 'head': director, 'pos': 'DET', 'tag': 'DT', 'token': A},
 {'dep': 'ROOT',
  'head': director,
  'pos': 'NOUN',
  'tag': 'NN',
  'token': director},
 {'dep': 'punct', 'head': director, 'pos': 'PUNCT', 'tag': '.', 'token': .}]

noun_chunks: [A director]

text: 'An ex-director.'

[{'dep': 'det', 'head': ex, 'pos': 'DET', 'tag': 'DT', 'token': An},
 {'dep': 'ROOT', 'head': ex, 'pos': 'NOUN', 'tag': 'NN', 'token': ex},
 {'dep': 'dobj', 'head': ex, 'pos': 'NOUN', 'tag': 'NN', 'token': -},
 {'dep': 'npadvmod', 'head': ex, 'pos': 'NOUN', 'tag': 'NN', 'token': director},
 {'dep': 'punct', 'head': ex, 'pos': 'PUNCT', 'tag': '.', 'token': .}]

noun_chunks: [An ex, -]

It is a systematic error, ex- is persistently treated as the main word in all ex-something combinations I tested.
Sm/md/lg models make no difference here.

Something similar happens with co-. E.g. co-founder is interpreted as co<-founder instead of co->founder (arrows point to heads). If I hack the tokenization to merge the above to a single token, especially with ex-, it significantly increases the number of nouns I should care about (in my heuristics). And other dash-split words are normally separated. So it's not a solution.

ines added models Issues related to the statistical models perf / accuracy Performance: accuracy labels Dec 14, 2018

ines pinned this issue Dec 14, 2018

ines changed the title ~~💫 Inaccurate predictions master thread~~ 💫 Inaccurate pre-trained model predictions master thread Dec 14, 2018

This was referenced Dec 14, 2018

Missing information for part of speech - French #1958

Closed

nlp(u'BUSINESS')[0].lemma_ == 'busines' #2900

Closed

adrianeboyd mentioned this issue Dec 20, 2022

ENTITY recognition skips closing bracket ) #11993

Closed

polm mentioned this issue Jan 13, 2023

Incorrect sentence parsing using ja_core_news_trf #12099

Closed

adrianeboyd mentioned this issue Apr 20, 2023

Apostrophes: It's Jess' n' Sam's car. #12468

Closed

This was referenced Jan 15, 2024

Verb characterized as noun in English text #13232

Closed

Incorrect lemmatization #13231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📚 Inaccurate pre-trained model predictions master thread #3052

📚 Inaccurate pre-trained model predictions master thread #3052

ines commented Dec 14, 2018

adrianeboyd commented Dec 20, 2022

Woodchucks commented Dec 20, 2022

stefan-veezoo commented Dec 28, 2022

adrianeboyd commented Jan 9, 2023

probavee commented Feb 7, 2023

postnubilaphoebus commented Feb 9, 2023

rmitsch commented Feb 9, 2023

stestagg commented Mar 21, 2023

adrianeboyd commented Mar 22, 2023

cbowdon commented May 5, 2023

adrianeboyd commented May 5, 2023

cbowdon commented May 5, 2023

giova-p commented Jul 28, 2023

adrianeboyd commented Jul 31, 2023

Arjuman23 commented Aug 20, 2023

svlandeg commented Aug 21, 2023

Arjuman23 commented Aug 21, 2023

svlandeg commented Aug 21, 2023

Mindful commented Oct 29, 2023 •

edited

Loading

rafa852 commented Nov 1, 2023

Why a master thread instead of separate issues?

Reporting incorrect predictions in this thread

cyriaka90 commented Dec 21, 2023 •

edited

Loading

glangford commented Jan 15, 2024 •

edited

Loading

joprice commented Aug 21, 2024

jomra commented Sep 13, 2024

ivan-kleshnin commented Dec 27, 2024 •

edited

Loading

📚 Inaccurate pre-trained model predictions master thread #3052

📚 Inaccurate pre-trained model predictions master thread #3052

Comments

ines commented Dec 14, 2018

Why a master thread instead of separate issues?

Reporting incorrect predictions in this thread

adrianeboyd commented Dec 20, 2022

Woodchucks commented Dec 20, 2022

stefan-veezoo commented Dec 28, 2022

adrianeboyd commented Jan 9, 2023

probavee commented Feb 7, 2023

postnubilaphoebus commented Feb 9, 2023

rmitsch commented Feb 9, 2023

stestagg commented Mar 21, 2023

adrianeboyd commented Mar 22, 2023

cbowdon commented May 5, 2023

adrianeboyd commented May 5, 2023

cbowdon commented May 5, 2023

giova-p commented Jul 28, 2023

adrianeboyd commented Jul 31, 2023

Arjuman23 commented Aug 20, 2023

svlandeg commented Aug 21, 2023

Arjuman23 commented Aug 21, 2023

svlandeg commented Aug 21, 2023

Mindful commented Oct 29, 2023 • edited Loading

rafa852 commented Nov 1, 2023

Why a master thread instead of separate issues?

Reporting incorrect predictions in this thread

cyriaka90 commented Dec 21, 2023 • edited Loading

glangford commented Jan 15, 2024 • edited Loading

joprice commented Aug 21, 2024

jomra commented Sep 13, 2024

ivan-kleshnin commented Dec 27, 2024 • edited Loading

Mindful commented Oct 29, 2023 •

edited

Loading

cyriaka90 commented Dec 21, 2023 •

edited

Loading

glangford commented Jan 15, 2024 •

edited

Loading

ivan-kleshnin commented Dec 27, 2024 •

edited

Loading