-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
push_alrage_final_version_for_OALL #473
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I'd like @NathanHB to take a look since he worked on LLM judges more than I did
|
||
from lighteval.metrics.metrics import Metrics | ||
from lighteval.metrics.llm_as_judge import JudgeLM | ||
from lighteval.metrics.metrics import Metric, MetricCategory, Metrics # Import MetricCategory and Metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unused comment
self.category = MetricCategory.LLM_AS_JUDGE # Add the category attribute | ||
self.corpus_level_fn = self.aggregate_scores # Define the corpus level function | ||
self.sample_level_fn = self._sample_level_fn | ||
self.higher_is_better = (True,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using a tuple for higher_is_better?
|
||
question = str(line["question"]) | ||
|
||
# Convert candidates to string if it isn't already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Convert candidates to string if it isn't already | |
# From a list of candidates, converts each candidate to a string | |
# From a string of candidates, splits it on newlines (assumes each candidate is a single line) |
if isinstance(line["candidates"], list): | ||
candidates = [str(c) for c in line["candidates"]] | ||
else: | ||
candidates = str(line["candidates"]).split("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to cast to str here? Don't you need to catch possible failures?
def qa_prompt_arabic(line: Dict, task_name: str = None) -> Doc: | ||
"""Format the prompt for question answering with candidates""" | ||
|
||
# Check the input line structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments should be 3rd person singular, so Check -> Checks, Convert -> Converts, etc
task_name=task_name or "alrage", | ||
query=query, | ||
instruction=instruction, | ||
choices=[gold_answer], # Ensure this is populated correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"# Ensure this is populated correctly" -> do you mean adding the gold_answer in choices ensures this?
"role": "system", | ||
"content": """أنت مقيّم محايد خبير. مهمتك هي: | ||
1. تقييم دقة الإجابة مقارنة بالإجابة الصحيحة | ||
2. التحقق من أن الإجابة مدعومة بالسياق المقدم | ||
3. تقييم جودة وشمولية الإجابة | ||
|
||
قم بتقييم الإجابة على مقياس من 0 إلى 10.""", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you could add a small translation of the different system prompts you are using for your judges (if you have the time, I think it would benefit the community to be able to use a similar method)
|
||
try: | ||
# Extract the score from the response content | ||
score = float(next(num for num in response_content.split() if num.replace(".", "", 1).isdigit())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How robust is this?/What did you test it against?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
No description provided.