MMLU answer extraction regex fails with repeated "Answer: LETTER" pattern #33

lucasresck · 2024-12-10T18:37:25Z

Description

The regular expression used to extract answers for MMLU in common.py fails when the pattern "Answer: LETTER" appears multiple times in the LLM output, affecting model performance.

Example

The following example demonstrates the issue with a German output. The model correctly selects "C", but the regex extracts "A" as the answer.

Explanation

The regular expression mistakenly only considers the first occurrence of "Answer: LETTER".

simple-evals/common.py

Lines 25 to 71 in a8e85cc

    
           MULTILINGUAL_ANSWER_PATTERN_TEMPLATE = ( 
        
               "(?i){}\s*([A-D]|[أ-د]|[অ]|[ব]|[ড]|[ঢ]|[Ａ]|[Ｂ]|[Ｃ]|[Ｄ])" 
        
           ) 
        
           # All the different ways "Answer" is written in different languages 
        
           MULTILINGUAL_ANSWER_REGEXES = [ 
        
               "Answer\s*:", 
        
               "Answer\s*:​​​​​​",  # Korean invisible character 
        
               "উত্তর\s*:", 
        
               "उत्तर\s*:", 
        
               "উত্তরঃ", 
        
               "উত্তর\s*:", 
        
               "Antwort\s*:", 
        
               "답변\s*:", 
        
               "정답\s*:", 
        
               "답\s*:", 
        
               "答案\s*：", 
        
               "答案\s*:", 
        
               "答\s*：", 
        
               "答\s*:", 
        
               "答复\s*：", 
        
               "答曰\s*：", 
        
               "الإجابة:", 
        
               "الجواب:", 
        
               "إجابة:", 
        
               "الإجابة النهائية:", 
        
               "الإجابة الصحيحة:", 
        
               "الإجابة الصحيحة هي:", 
        
               "الإجابة هي:", 
        
               "Respuesta\s*:", 
        
               "Risposta\s*:", 
        
               "答え\s*:", 
        
               "答え\s*：", 
        
               "回答\s*:", 
        
               "回答\s*：", 
        
               "解答\s*:", 
        
               "Jawaban\s*:", 
        
               "Réponse\s*:", 
        
               "Resposta\s*:", 
        
               "Jibu\s*:", 
        
               "Idahun\s*:", 
        
               "Ìdáhùn\s*:", 
        
               "Idáhùn\s*:", 
        
               "Àmọ̀nà\s*:", 
        
               "Àdáhùn\s*:", 
        
               "Ànúgọ\s*:", 
        
               "Àṣàyàn\s*:", 
        
           ]

In the German example above, it extracts the answer "A" from "Antwort:\n\nAntwort: C" because "Antwort:\n\nAntwort: C".

Impact

This bug significantly impacts the evaluation results for certain languages. In my experiments, German experienced this issue with ~20% of the samples, and Indonesian showed a ~4% impact. Other languages seem less affected.

The text was updated successfully, but these errors were encountered:

lucasresck mentioned this issue Dec 10, 2024

Fix MMLU answer extraction regex for repeated "Answer: LETTER" pattern #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMLU answer extraction regex fails with repeated "Answer: LETTER" pattern #33

MMLU answer extraction regex fails with repeated "Answer: LETTER" pattern #33

lucasresck commented Dec 10, 2024

MMLU answer extraction regex fails with repeated "Answer: LETTER" pattern #33

MMLU answer extraction regex fails with repeated "Answer: LETTER" pattern #33

Comments

lucasresck commented Dec 10, 2024

Description

Example

Explanation

Impact