push_alrage_final_version_for_OALL #473

Manel-Hik · 2024-12-21T14:50:22Z

No description provided.

clefourrier

LGTM but I'd like @NathanHB to take a look since he worked on LLM judges more than I did

clefourrier · 2024-12-26T11:56:35Z

community_tasks/arabic_evals.py


-from lighteval.metrics.metrics import Metrics
+from lighteval.metrics.llm_as_judge import JudgeLM
+from lighteval.metrics.metrics import Metric, MetricCategory, Metrics  # Import MetricCategory and Metric


remove unused comment

clefourrier · 2024-12-26T11:57:46Z

community_tasks/arabic_evals.py

+        self.category = MetricCategory.LLM_AS_JUDGE  # Add the category attribute
+        self.corpus_level_fn = self.aggregate_scores  # Define the corpus level function
+        self.sample_level_fn = self._sample_level_fn
+        self.higher_is_better = (True,)


Why are you using a tuple for higher_is_better?

clefourrier · 2024-12-26T12:01:47Z

community_tasks/arabic_evals.py

+
+    question = str(line["question"])
+
+    # Convert candidates to string if it isn't already


Suggested change

# Convert candidates to string if it isn't already

# From a list of candidates, converts each candidate to a string

# From a string of candidates, splits it on newlines (assumes each candidate is a single line)

clefourrier · 2024-12-26T12:02:37Z

community_tasks/arabic_evals.py

+    if isinstance(line["candidates"], list):
+        candidates = [str(c) for c in line["candidates"]]
+    else:
+        candidates = str(line["candidates"]).split("\n")


Why do you need to cast to str here? Don't you need to catch possible failures?

clefourrier · 2024-12-26T12:03:16Z

community_tasks/arabic_evals.py

+def qa_prompt_arabic(line: Dict, task_name: str = None) -> Doc:
+    """Format the prompt for question answering with candidates"""
+
+    # Check the input line structure


All comments should be 3rd person singular, so Check -> Checks, Convert -> Converts, etc

clefourrier · 2024-12-26T12:03:54Z

community_tasks/arabic_evals.py

+        task_name=task_name or "alrage",
+        query=query,
+        instruction=instruction,
+        choices=[gold_answer],  # Ensure this is populated correctly


"# Ensure this is populated correctly" -> do you mean adding the gold_answer in choices ensures this?

clefourrier · 2024-12-26T12:05:45Z

community_tasks/arabic_evals.py

+            "role": "system",
+            "content": """أنت مقيّم محايد خبير. مهمتك هي:
+1. تقييم دقة الإجابة مقارنة بالإجابة الصحيحة
+2. التحقق من أن الإجابة مدعومة بالسياق المقدم
+3. تقييم جودة وشمولية الإجابة
+
+قم بتقييم الإجابة على مقياس من 0 إلى 10.""",
+        },


It would be great if you could add a small translation of the different system prompts you are using for your judges (if you have the time, I think it would benefit the community to be able to use a similar method)

clefourrier · 2024-12-26T12:06:11Z

community_tasks/arabic_evals.py

+
+    try:
+        # Extract the score from the response content
+        score = float(next(num for num in response_content.split() if num.replace(".", "", 1).isdigit()))


How robust is this?/What did you test it against?

HuggingFaceDocBuilderDev · 2024-12-26T12:08:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Manel-Hik and others added 3 commits December 21, 2024 12:18

push_alrage_final_version

f711450

Fix formatting and linting issues via pre-commit hooks

b9c5710

fix_model_id

1839aae

clefourrier reviewed Dec 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

push_alrage_final_version_for_OALL #473

push_alrage_final_version_for_OALL #473

Manel-Hik commented Dec 21, 2024

clefourrier left a comment

clefourrier Dec 26, 2024

clefourrier Dec 26, 2024

clefourrier Dec 26, 2024

clefourrier Dec 26, 2024

clefourrier Dec 26, 2024

clefourrier Dec 26, 2024

clefourrier Dec 26, 2024

clefourrier Dec 26, 2024

HuggingFaceDocBuilderDev commented Dec 26, 2024


		question = str(line["question"])

		# Convert candidates to string if it isn't already

	# Convert candidates to string if it isn't already
	# From a list of candidates, converts each candidate to a string
	# From a string of candidates, splits it on newlines (assumes each candidate is a single line)

push_alrage_final_version_for_OALL #473

Are you sure you want to change the base?

push_alrage_final_version_for_OALL #473

Conversation

Manel-Hik commented Dec 21, 2024

clefourrier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 26, 2024