Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push_alrage_final_version_for_OALL #473

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Manel-Hik
Copy link

No description provided.

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'd like @NathanHB to take a look since he worked on LLM judges more than I did


from lighteval.metrics.metrics import Metrics
from lighteval.metrics.llm_as_judge import JudgeLM
from lighteval.metrics.metrics import Metric, MetricCategory, Metrics # Import MetricCategory and Metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused comment

self.category = MetricCategory.LLM_AS_JUDGE # Add the category attribute
self.corpus_level_fn = self.aggregate_scores # Define the corpus level function
self.sample_level_fn = self._sample_level_fn
self.higher_is_better = (True,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using a tuple for higher_is_better?


question = str(line["question"])

# Convert candidates to string if it isn't already
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Convert candidates to string if it isn't already
# From a list of candidates, converts each candidate to a string
# From a string of candidates, splits it on newlines (assumes each candidate is a single line)

if isinstance(line["candidates"], list):
candidates = [str(c) for c in line["candidates"]]
else:
candidates = str(line["candidates"]).split("\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to cast to str here? Don't you need to catch possible failures?

def qa_prompt_arabic(line: Dict, task_name: str = None) -> Doc:
"""Format the prompt for question answering with candidates"""

# Check the input line structure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments should be 3rd person singular, so Check -> Checks, Convert -> Converts, etc

task_name=task_name or "alrage",
query=query,
instruction=instruction,
choices=[gold_answer], # Ensure this is populated correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"# Ensure this is populated correctly" -> do you mean adding the gold_answer in choices ensures this?

Comment on lines +918 to +925
"role": "system",
"content": """أنت مقيّم محايد خبير. مهمتك هي:
1. تقييم دقة الإجابة مقارنة بالإجابة الصحيحة
2. التحقق من أن الإجابة مدعومة بالسياق المقدم
3. تقييم جودة وشمولية الإجابة

قم بتقييم الإجابة على مقياس من 0 إلى 10.""",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could add a small translation of the different system prompts you are using for your judges (if you have the time, I think it would benefit the community to be able to use a similar method)


try:
# Extract the score from the response content
score = float(next(num for num in response_content.split() if num.replace(".", "", 1).isdigit()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How robust is this?/What did you test it against?

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants