Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange parse of Italian sentence. #3008

Closed
iliakur opened this issue Dec 4, 2018 · 4 comments
Closed

Strange parse of Italian sentence. #3008

iliakur opened this issue Dec 4, 2018 · 4 comments
Labels
lang / it Italian language data and models perf / accuracy Performance: accuracy

Comments

@iliakur
Copy link

iliakur commented Dec 4, 2018

How to reproduce the behaviour

JSON formatting is mine.

>>> import spacy
>>> model = spacy.load("it_core_news_sm", disable=["ner"])
>>> doc = model("la gatta mangia pesce")
>>> doc.print_tree()
[
    {
        "word":"pesce",
        "lemma":"pesce",
        "NE":"",
        "POS_fine":"S__Gender=Masc|Number=Sing",
        "POS_coarse":"NOUN",
        "arc":"ROOT",
        "modifiers":[
            {
                "word":"mangia",
                "lemma":"mangiare",
                "NE":"",
                "POS_fine":"V__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin",
                "POS_coarse":"VERB",
                "arc":"nsubj",
                "modifiers":[
                    {
                        "word":"la",
                        "lemma":"la",
                        "NE":"",
                        "POS_fine":"RD__Definite=Def|Gender=Fem|Number=Sing|PronType=Art",
                        "POS_coarse":"DET",
                        "arc":"det",
                        "modifiers":[

                        ]
                    },
                    {
                        "word":"gatta",
                        "lemma":"gatta",
                        "NE":"",
                        "POS_fine":"S__Gender=Fem|Number=Sing",
                        "POS_coarse":"NOUN",
                        "arc":"amod",
                        "modifiers":[

                        ]
                    }
                ]
            }
        ]
    }
]

The parse claims pesce is the ROOT of the sentence, whereas in fact it is the object. The rest of the dependencies are correspondingly unrealistic.
Interesting to note that the POS-tagger seems to work fine, the parser just ignores the postags completely.

Your Environment

  • Operating System: Ubuntu 18.04
  • Python Version Used: Python 3.6.7
  • spaCy Version Used: 2.0.9
  • Environment Information:
    models: en, fr, de, nl, it
@honnibal
Copy link
Member

honnibal commented Dec 6, 2018

The parser and POS tagger are independent --- it doesn't use POS tags as features. So yes, sometimes the parser makes errors that don't match the POS tags, sometimes vice versa.

@honnibal honnibal added lang / it Italian language data and models perf / accuracy Performance: accuracy labels Dec 6, 2018
@iliakur
Copy link
Author

iliakur commented Dec 6, 2018

We played around a bit more with the model and for what it's worth, these variations of the sentence get parsed correctly:

  • la gatta mangia il pesce
  • il gatto magia pesce

@ines
Copy link
Member

ines commented Dec 14, 2018

Merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details.

@ines ines closed this as completed Dec 14, 2018
@lock
Copy link

lock bot commented Jan 13, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lang / it Italian language data and models perf / accuracy Performance: accuracy
Projects
None yet
Development

No branches or pull requests

3 participants