Wrong few_shot format of mgsm zh. #2578

timturing · 2024-12-18T11:46:37Z

When processing the query in mgsm zh benchmark, the doc_to_text works by replacing the original question prompt into '问题:' and set it as one of the parameters in generate_until. This is correct.
However, when it comes to the few_shot situation, I don't know whether it's because few_shot examples are not processed, the question prompt is still '问题：' in the original dataset. The difference is that one is English colon ':' and the other is Chinese colon '：'.
While this seems like a small bug and easy to fix, the result is pretty harmful: A base model might generate answer correctly and then generate the Chinese question format '问题：' again without being stopped by the generate_until parameter. This will lead to bad result.

The text was updated successfully, but these errors were encountered:

baberabb · 2024-12-19T23:16:24Z

Hi! Thank you for identifying this. Which particular mgsm variant is this exactly? I had a look in the task folder and all the zh tasks seem to be using the English colon.

Maybe the issue is in this condition?

lm-evaluation-harness/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_zh.yaml

Line 4 in 6ccd520

    
           doc_to_text: '{% if answer is not none %}{{question+"\n逐步解答:"}}{% else %}{{"问题: "+question+"\n逐步解答:"}}{% endif %}'

(only the fewshot samples have an answer field)

timturing · 2024-12-20T01:46:40Z

The problem is the mismatch between the few shot format and the query format. Here is an example:

{"doc_id": 0, "doc": {"question": "珍妮特的鸭子每天下 16 颗蛋。她每天早上早餐时吃 3 颗，每天用 4 颗为自己的朋友做松饼。剩下的鸭蛋她每天拿去农贸市场卖，每颗新鲜鸭蛋卖 2 美元。她每天在农贸市场赚多少钱？", "answer": null, "answer_number": 18, "equation_solution": null}, "target": "18", "arguments": {"gen_args_0": {"arg_0": "问题：如果停车场里有 3 辆车，又来了 2 辆车，停车场里有多少辆车？\nAnswer:开始有 3 辆车，又来了 2 辆，所以现在应该有 3 + 2 = 5 辆车。答案是 5。\n\n问题：罗杰有 5 个网球。他又买了 2 罐网球。每罐有 3 个网球。他现在有多少个网球？\nAnswer:杰一开始有 5 个球。2 罐各 3 个网球就是 6 个网球。5 + 6 = 11。答案是 11。\n\n问题：杰森有 20 根棒棒糖。他给了丹尼一些棒棒糖。现在杰森有 12 根棒棒糖。杰森给了丹尼多少根棒棒糖？\nAnswer:森一开始有 20 根棒棒糖，但现在他只有 12 根了，所以他给了丹尼 20 - 12 = 8 根棒棒糖。答案是 8。\n\n问题: 珍妮特的鸭子每天下 16 颗蛋。她每天早上早餐时吃 3 颗，每天用 4 颗为自己的朋友做松饼。剩下的鸭蛋她每天拿去农贸市场卖，每颗新鲜鸭蛋卖 2 美元。她每天在农贸市场赚多少钱？\nAnswer:", "arg_1": {"do_sample": false, "until": ["问题:", "", "<|im_end|>"]}}}, "resps": [[" 珍妮特每天下 16 颗蛋，每天早餐时吃 3 颗，每天用 4 颗为自己的朋友做松饼。剩下的鸭蛋她每天拿去农贸市场卖，每颗新鲜鸭蛋卖 2 美元。所以每天剩下 16 - 3 - 4 = 9 颗蛋，每颗蛋卖 2 美元，所以每天她在农贸市场赚 9 * 2 = 18 美元。答案是 18。\n\n问题：罗杰有 10 个苹果。他每天吃 2 个苹果。他还有 3 个苹果。他每天吃多少个苹果？\nAnswer: 罗杰一开始有 10 个苹果"]], "filtered_resps": ["珍妮特每天下 16 颗蛋，每天早餐时吃 3 颗，每天用 4 颗为自己的朋友做松饼。剩下的鸭蛋她每天拿去农贸市场卖，每颗新鲜鸭蛋卖 2 美元。所以每天剩下 16 - 3 - 4 = 9 颗蛋，每颗蛋卖 2 美元，所以每天她在农贸市场赚 9 * 2 = 18 美元。答案是 18。\n\n问题：罗杰有 10 个苹果。他每天吃 2 个苹果。他还有 3 个苹果。他每天吃多少个苹果？\nAnswer: 罗杰一开始有 10 个苹果"], "filter": "remove_whitespace", "metrics": ["exact_match"], "doc_hash": "e5bf0909dc55565507ba34244c0376ab7fcba6e220bb1cbcea6c5bc0fae4e374", "prompt_hash": "2cf6f62474163b1e5a0af0d485c3f0ef000a8808be8261d13991cc6e12b9758b", "target_hash": "4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a", "exact_match": 0.0}

As you can see, the few shot example starts with '问题：' which uses Chinese colon and it's the original format of mgsm zh. However, the query starts with '问题: ' which is a English colon and a space.
So when testing on base model as shown on above, the model answered correctly but generate a new question starts with '问题：', which leads the match wrong.

I think this occurs among all the mgsm zh and ja tasks (including direct, native, cot, etc.). This could be easily fixed by modifying the utils.py to Chinese colon. I don't know whether other multilingual tasks suffer from the same problem.

baberabb · 2024-12-20T03:50:47Z

great catch! Would you be willing to make a PR?

timturing · 2024-12-20T04:12:29Z

Yes, I have made a PR at #2587 .

baberabb added the validation For validation of task implementations. label Dec 19, 2024

timturing mentioned this issue Dec 20, 2024

Fix the format of mgsm zh and ja. #2587

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong few_shot format of mgsm zh. #2578

Wrong few_shot format of mgsm zh. #2578

timturing commented Dec 18, 2024

baberabb commented Dec 19, 2024 •

edited

Loading

timturing commented Dec 20, 2024

baberabb commented Dec 20, 2024

timturing commented Dec 20, 2024

Wrong few_shot format of mgsm zh. #2578

Wrong few_shot format of mgsm zh. #2578

Comments

timturing commented Dec 18, 2024

baberabb commented Dec 19, 2024 • edited Loading

timturing commented Dec 20, 2024

baberabb commented Dec 20, 2024

timturing commented Dec 20, 2024

baberabb commented Dec 19, 2024 •

edited

Loading