Releases: explosion/spacy-models
pt_core_news_sm-2.1.0a6
Details: https://spacy.io/models/pt#pt_core_news_sm
File checksum:
306516b5b761ce7a20c6adb719a570b0b3d432e35f188fca06be9f1a7f42406d
Portuguese multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | pt_core_news_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 12 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-SA 4.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
89.14 |
ENTS_P |
89.23 |
ENTS_R |
89.04 |
LAS |
86.02 |
TAGS_ACC |
80.44 |
TOKEN_ACC |
100.00 |
UAS |
89.36 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download pt_core_news_sm
nl_core_news_sm-2.1.0a6
Details: https://spacy.io/models/nl#nl_core_news_sm
File checksum:
8eb8bf0133694bfa28a6f27dcc44c178b15ed9c08f38f54e7a2cb351e7618c7d
Dutch multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | nl_core_news_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-SA 4.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
87.05 |
ENTS_P |
86.56 |
ENTS_R |
87.54 |
LAS |
77.56 |
TAGS_ACC |
91.47 |
TOKEN_ACC |
100.00 |
UAS |
83.72 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download nl_core_news_sm
it_core_news_sm-2.1.0a6
Details: https://spacy.io/models/it#it_core_news_sm
File checksum:
537c0c85d112a8f5d9c2e1d11049b273839767182eb2e085baebebb73843fe32
Italian multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | it_core_news_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-NC-SA 3.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
86.41 |
ENTS_P |
86.63 |
ENTS_R |
86.18 |
LAS |
87.18 |
TAGS_ACC |
95.91 |
TOKEN_ACC |
100.00 |
UAS |
90.93 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download it_core_news_sm
fr_core_news_sm-2.1.0a6
Details: https://spacy.io/models/fr#fr_core_news_sm
File checksum:
db806c0e640d4ac9c11471461e2884e0a8c2fa91ddbacf0dea078c018afb79a9
French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | fr_core_news_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 12 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Sequoia Corpus (UD), Wikipedia |
License | LGPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
82.87 |
ENTS_P |
82.97 |
ENTS_R |
82.77 |
LAS |
84.76 |
TAGS_ACC |
94.54 |
TOKEN_ACC |
100.00 |
UAS |
87.67 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download fr_core_news_sm
fr_core_news_md-2.1.0a6
Details: https://spacy.io/models/fr#fr_core_news_md
File checksum:
b2b76fc3f3313b7492f15b6339191419095379289ccbbbc14c67f7991efcf13e
French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | fr_core_news_md |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 81 MB |
Pipeline | tagger , parser , ner |
Vectors | 579447 keys, 20000 unique vectors (300 dimensions) |
Sources | Sequoia Corpus (UD), Wikipedia |
License | LGPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.36 |
ENTS_P |
83.48 |
ENTS_R |
83.25 |
LAS |
86.48 |
TAGS_ACC |
95.12 |
TOKEN_ACC |
100.00 |
UAS |
89.14 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download fr_core_news_md
es_core_news_sm-2.1.0a6
Details: https://spacy.io/models/es#es_core_news_sm
File checksum:
56473ffbdb1bd125881681a161c5a3bbbd13f5e76b3fbc5bf4e2b09adf541615
Spanish multi-task CNN trained on the AnCora and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | es_core_news_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | AnCora, Wikipedia |
License | GPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
88.98 |
ENTS_P |
89.06 |
ENTS_R |
88.90 |
LAS |
87.28 |
TAGS_ACC |
97.03 |
TOKEN_ACC |
100.00 |
UAS |
90.33 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download es_core_news_sm
es_core_news_md-2.1.0a6
Details: https://spacy.io/models/es#es_core_news_md
File checksum:
77c0a1b9ebd2cf32644bf1c592c0b1d14041a0202f7a7ef8d0be3b68afa44519
Spanish multi-task CNN trained on the AnCora and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | es_core_news_md |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 69 MB |
Pipeline | tagger , parser , ner |
Vectors | 533736 keys, 20000 unique vectors (50 dimensions) |
Sources | AnCora, Wikipedia |
License | GPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
89.30 |
ENTS_P |
89.42 |
ENTS_R |
89.19 |
LAS |
88.06 |
TAGS_ACC |
97.18 |
TOKEN_ACC |
100.00 |
UAS |
90.87 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download es_core_news_md
en_core_web_sm-2.1.0a6
Details: https://spacy.io/models/en#en_core_web_sm
File checksum:
927785b2aabb43d888437295a11b071798570dbd8c67cf80c611bc1c6927898c
English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.
Feature | Description |
---|---|
Name | en_core_web_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | OntoNotes 5 |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
85.49 |
ENTS_P |
85.66 |
ENTS_R |
85.33 |
LAS |
89.64 |
TAGS_ACC |
96.80 |
TOKEN_ACC |
99.06 |
UAS |
91.53 |
Installation
pip install spacy-nightly
spacy download en_core_web_sm
en_core_web_md-2.1.0a6
Details: https://spacy.io/models/en#en_core_web_md
File checksum:
ea971369a13056cee2bddaaf1c5b342b16bc0a0f45228abde4b4c4635f469f1f
English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.
Feature | Description |
---|---|
Name | en_core_web_md |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 91 MB |
Pipeline | tagger , parser , ner |
Vectors | 684830 keys, 20000 unique vectors (300 dimensions) |
Sources | OntoNotes 5, Common Crawl |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
86.40 |
ENTS_P |
86.50 |
ENTS_R |
86.30 |
LAS |
90.16 |
TAGS_ACC |
96.96 |
TOKEN_ACC |
99.06 |
UAS |
91.94 |
Installation
pip install spacy-nightly
spacy download en_core_web_md
en_core_web_lg-2.1.0a6
Details: https://spacy.io/models/en#en_core_web_lg
File checksum:
6ee2325f253b8f74693c07311071eab99e504acfc37f8da7a6a88a53fb0496f9
English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.
Feature | Description |
---|---|
Name | en_core_web_lg |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 788 MB |
Pipeline | tagger , parser , ner |
Vectors | 684830 keys, 684831 unique vectors (300 dimensions) |
Sources | OntoNotes 5, Common Crawl |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
86.62 |
ENTS_P |
86.68 |
ENTS_R |
86.57 |
LAS |
90.20 |
TAGS_ACC |
97.02 |
TOKEN_ACC |
99.06 |
UAS |
91.97 |
Installation
pip install spacy-nightly
spacy download en_core_web_lg