Text Embeddings Inference on Habana Gaudi

Get started

To use 🤗 text-embeddings-inference on Habana Gaudi/Gaudi2, follow these steps:

Pull the official Docker image with:

docker pull ghcr.io/huggingface/tei-gaudi:latest

Note

Alternatively, you can build the Docker image using Dockerfile-hpu located in this folder with:

docker build -f Dockerfile-hpu -t tei_gaudi .

Launch a local server instance on 1 Gaudi card:

model=BAAI/bge-large-en-v1.5
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tei-gaudi:latest --model-id $model --pooling cls

For models within the Transformers library that need remote code to run customized implementations, please set the environment variable -e TRUST_REMOTE_CODE=TRUE within docker run command line. Here is an example:

model="Alibaba-NLP/gte-large-en-v1.5"
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 -e TRUST_REMOTE_CODE=TRUE --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tei-gaudi:latest --model-id $model --pooling cls

You can then send a request:

 curl 127.0.0.1:8080/embed \
     -X POST \
     -d '{"inputs":"What is Deep Learning?"}' \
     -H 'Content-Type: application/json'

For more information and documentation about Text Embeddings Inference, checkout README of the original repo.

Supported Models

Text Embeddings

tei-gaudi currently supports Nomic, BERT, CamemBERT, XLM-RoBERTa models with absolute positions, JinaBERT model with Alibi positions and Mistral, Alibaba GTE and Qwen2 models with Rope positions.

Below are some examples of our validated models:

Architecture	Pooling	Models
BERT	Cls/Mean/Last token	BAAI/bge-large-en-v1.5 sentence-transformers/all-MiniLM-L6-v2 sentence-transformers/all-MiniLM-L12-v2 sentence-transformers/multi-qa-MiniLM-L6-cos-v1 sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 sentence-transformers/paraphrase-MiniLM-L3-v2
BERT	Splade	naver/efficient-splade-VI-BT-large-query
MPNet	Cls/Mean/Last token	sentence-transformers/all-mpnet-base-v2 sentence-transformers/paraphrase-multilingual-mpnet-base-v2 sentence-transformers/multi-qa-mpnet-base-dot-v1
ALBERT	Cls/Mean/Last token	sentence-transformers/paraphrase-albert-small-v2
Mistral	Cls/Mean/Last token	intfloat/e5-mistral-7b-instruct Salesforce/SFR-Embedding-2_R
GTE	Cls/Mean/Last token	Alibaba-NLP/gte-large-en-v1.5
JinaBERT	Cls/Mean/Last token	jinaai/jina-embeddings-v2-base-en

Sequence Classification and Re-Ranking

tei-gaudi currently supports CamemBERT, and XLM-RoBERTa Sequence Classification models with absolute positions.

Below are some examples of the currently supported models:

Task	Model Type	Model ID
Re-Ranking	XLM-RoBERTa	BAAI/bge-reranker-large
Re-Ranking	XLM-RoBERTa	BAAI/bge-reranker-base
Sentiment Analysis	RoBERTa	SamLowe/roberta-base-go_emotions

How to Use

Using Re-rankers models

model=BAAI/bge-reranker-large
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tei-gaudi:latest --model-id $model

And then you can rank the similarity between a query and a list of texts with:

curl 127.0.0.1:8080/rerank \
    -X POST \
    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
    -H 'Content-Type: application/json'

Using Sequence Classification models

You can also use classic Sequence Classification models like SamLowe/roberta-base-go_emotions:

model=SamLowe/roberta-base-go_emotions
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tei-gaudi:latest --model-id $model

Once you have deployed the model you can use the predict endpoint to get the emotions most associated with an input:

curl 127.0.0.1:8080/predict \
    -X POST \
    -d '{"inputs":"I like you."}' \
    -H 'Content-Type: application/json'

Using SPLADE pooling

You can choose to activate SPLADE pooling for Bert and Distilbert MaskedLM architectures:

docker build -f Dockerfile-hpu -t tei_gaudi .
model=naver/efficient-splade-VI-BT-large-query
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host tei_gaudi --model-id $model --pooling splade

Once you have deployed the model you can use the /embed_sparse endpoint to get the sparse embedding:

curl 127.0.0.1:8080/embed_sparse \
    -X POST \
    -d '{"inputs":"I like you."}' \
    -H 'Content-Type: application/json'

The license to use TEI on Habana Gaudi is the one of TEI: https://github.com/huggingface/text-embeddings-inference/blob/main/LICENSE

Please reach out to [email protected] if you have any question.

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.cargo		.cargo
.github		.github
assets		assets
backends		backends
core		core
docs		docs
load_tests		load_tests
proto		proto
router		router
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Dockerfile-cuda		Dockerfile-cuda
Dockerfile-cuda-all		Dockerfile-cuda-all
Dockerfile-hpu		Dockerfile-hpu
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cuda-all-entrypoint.sh		cuda-all-entrypoint.sh
rust-toolchain.toml		rust-toolchain.toml
sagemaker-entrypoint-cuda-all.sh		sagemaker-entrypoint-cuda-all.sh
sagemaker-entrypoint.sh		sagemaker-entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Embeddings Inference on Habana Gaudi

Table of contents

Get started

Supported Models

Text Embeddings

Sequence Classification and Re-Ranking

How to Use

Using Re-rankers models

Using Sequence Classification models

Using SPLADE pooling

About

Releases 3

Packages

Languages

License

huggingface/tei-gaudi

Folders and files

Latest commit

History

Repository files navigation

Text Embeddings Inference on Habana Gaudi

Table of contents

Get started

Supported Models

Text Embeddings

Sequence Classification and Re-Ranking

How to Use

Using Re-rankers models

Using Sequence Classification models

Using SPLADE pooling

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages