-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding chat completion task to endpoint models #281
base: main
Are you sure you want to change the base?
Adding chat completion task to endpoint models #281
Conversation
src/lighteval/few_shot_manager.py
Outdated
@@ -181,35 +182,33 @@ def init_fewshot_sampling_balanced( | |||
def get_examples_with_chat_template( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change this method to return List[ChatCompletionInputMessage]
as InferenceClient.chat_completion()
doesn't accept string. I made changes accordingly to BaseModel
and NanotronModel
to consider conversational contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment now is relevant to PromptManager.get_examples()
.
src/lighteval/few_shot_manager.py
Outdated
@@ -220,7 +219,7 @@ def get_examples( | |||
return instruction + labeled_examples + example | |||
|
|||
def create_multi_turn_contexts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a follow-up PR for multi-turn contexts to work with ChatCompletionInputMessage
instead of str in FewshotManager
and BaseModel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is relevant to PromptManager._multi_turn_contexts()
.
) | ||
from lighteval.utils.utils import EnvConfig, as_list | ||
|
||
|
||
EndpointInput: TypeAlias = TextGenerationInput | ChatCompletionInput | ||
EndpointOutput: TypeAlias = TextGenerationOutput | ChatCompletionOutput |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes I made to endpoint model was to pave the way for the day Lighteval might add evaluation of commercial models, or add the evaluation of other base tasks e.g. visual question answering, reusing most of the logic in the parent endpoint model. Endpoint model methods are organized as follows:
greedy_until()
,loglikelihood()
,loglikelihood_rolling()
: public apis of the model that could be reused in inheriting endpoint models. These methods call_process_batch()
or_async_process_batch()
_process_batch()
and_async_process_batch()
: for batch processing and could be reused in inheriting endpoint models. They call_prepare_request()
and then_process_request()
._prepare_request()
: bears the responsibility to convert the incoming request toEndpointInput
which is one of thehuggingface_hub.InferenceType
predefined types. This also could be reused among different endpoint classes._process_request()
: given theEndpointInput
, it creates theEndpointOutput
using the client. This is somewhat endpoint specific._process_generate_response()
and_process_logprob_response()
: endpoint specific methods taking care of creatingModelResponse
using theEndpointOutput
. Before, these were part of thegreedy_until()
andloglikelihood()
methods.
Specifically, I wanted to propose this directory structure for endpoint models:
lighteval/
models/
endpoints/
endpoint_model.py
inference_endpoint_model.py
tgi_model.py
anthropic_model.py
openai_model.py
in which endpoint_model.py
holds most of the logic and the child models override some methods if necessary.
from lighteval.utils.imports import NO_TGI_ERROR_MSG, is_tgi_available | ||
|
||
|
||
if is_tgi_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TGI recommends using huggnigface_hub
over text-generation
.
https://github.com/huggingface/text-generation-inference/tree/main/clients/python
@@ -38,6 +44,9 @@ class RequestType(Enum): | |||
GREEDY_UNTIL_MULTI_TURN = auto() | |||
|
|||
|
|||
Context: TypeAlias = object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced this type to account for both str
and Conversation
but in the future it could be for example huggingface_hub.DocumentQuestionAnsweringInputData
for Document Question Answering.
- We could put additional types like
Conversation
,Context
,etc. in alighteval/types.py
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idea: currently . We could imagine having a task.fewshot_sampler.fewshot_context()
is the ultimate responsible for creating the context for a doc even if the task hasn't a few-shot settingcontext_augmenters
attribute for the task giving it to prompt manager, containing everything that could augment the context like a few-shot manager or a RAG retriever and have them one by one apply themselves to the context ,starting from initial context which is the instruction+query, in the prompt manager's add_context_to_doc()
method.
de60b36
to
f881dc3
Compare
forgot to add in base_model.py
a111ce0
to
c3ac5d6
Compare
c3ac5d6
to
8c0018e
Compare
Pipeline
PromptManager
Hi there!
This PR attempts to address the need for evaluating endpoint models on chat completion tasks, i.e. using chat templating.
BaseModel
andNanotronModel
supported it through
FewshotManager.fewshot_context()
which applies chat template to the fewshot & query examples. For endpoint models we could either usethe very
InferenceClient.text_generation()
or the nativeIneferenceClient.chat_completion()
apis. This PR attempts to use the latter.Generally, could be fruitful if Lighteval makes use of
huggingface_hub
types extensively? At least forGenerativeResponse
'sresult
attribute to be of typeChatcompletionOutput|TextGenerationOutput
and metrics work with inputs of these types as well so that we could evaluate function calling and tools easily. Or forGreedyUntilRequest
'scontext
attribute to be of typeConversation : TypeAlias = List[ChatCompletionInputMessage]
to be able to feed tools params.