-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong few_shot format of mgsm zh. #2578
Comments
Hi! Thank you for identifying this. Which particular mgsm variant is this exactly? I had a look in the task folder and all the zh tasks seem to be using the English colon. Maybe the issue is in this condition?
(only the fewshot samples have an |
The problem is the mismatch between the few shot format and the query format. Here is an example: {"doc_id": 0, "doc": {"question": "珍妮特的鸭子每天下 16 颗蛋。她每天早上早餐时吃 3 颗,每天用 4 颗为自己的朋友做松饼。剩下的鸭蛋她每天拿去农贸市场卖,每颗新鲜鸭蛋卖 2 美元。她每天在农贸市场赚多少钱?", "answer": null, "answer_number": 18, "equation_solution": null}, "target": "18", "arguments": {"gen_args_0": {"arg_0": "问题:如果停车场里有 3 辆车,又来了 2 辆车,停车场里有多少辆车?\nAnswer:开始有 3 辆车,又来了 2 辆,所以现在应该有 3 + 2 = 5 辆车。答案是 5。\n\n问题:罗杰有 5 个网球。他又买了 2 罐网球。每罐有 3 个网球。他现在有多少个网球?\nAnswer:杰一开始有 5 个球。2 罐各 3 个网球就是 6 个网球。5 + 6 = 11。答案是 11。\n\n问题:杰森有 20 根棒棒糖。他给了丹尼一些棒棒糖。现在杰森有 12 根棒棒糖。杰森给了丹尼多少根棒棒糖?\nAnswer:森一开始有 20 根棒棒糖,但现在他只有 12 根了,所以他给了丹尼 20 - 12 = 8 根棒棒糖。答案是 8。\n\n问题: 珍妮特的鸭子每天下 16 颗蛋。她每天早上早餐时吃 3 颗,每天用 4 颗为自己的朋友做松饼。剩下的鸭蛋她每天拿去农贸市场卖,每颗新鲜鸭蛋卖 2 美元。她每天在农贸市场赚多少钱?\nAnswer:", "arg_1": {"do_sample": false, "until": ["问题:", "", "<|im_end|>"]}}}, "resps": [[" 珍妮特每天下 16 颗蛋,每天早餐时吃 3 颗,每天用 4 颗为自己的朋友做松饼。剩下的鸭蛋她每天拿去农贸市场卖,每颗新鲜鸭蛋卖 2 美元。所以每天剩下 16 - 3 - 4 = 9 颗蛋,每颗蛋卖 2 美元,所以每天她在农贸市场赚 9 * 2 = 18 美元。答案是 18。\n\n问题: 罗杰有 10 个苹果。他每天吃 2 个苹果。他还有 3 个苹果。他每天吃多少个苹果?\nAnswer: 罗杰一开始有 10 个苹果"]], "filtered_resps": ["珍妮特每天下 16 颗蛋,每天早餐时吃 3 颗,每天用 4 颗为自己的朋友做松饼。剩下的鸭蛋她每天拿去农贸市场卖,每颗新鲜鸭蛋卖 2 美元。所以每天剩下 16 - 3 - 4 = 9 颗蛋,每颗蛋卖 2 美元,所以每天她在农贸市场赚 9 * 2 = 18 美元。答案是 18。\n\n问题: 罗杰有 10 个苹果。他每天吃 2 个苹果。他还有 3 个苹果。他每天吃多少个苹果?\nAnswer: 罗杰一开始有 10 个苹果"], "filter": "remove_whitespace", "metrics": ["exact_match"], "doc_hash": "e5bf0909dc55565507ba34244c0376ab7fcba6e220bb1cbcea6c5bc0fae4e374", "prompt_hash": "2cf6f62474163b1e5a0af0d485c3f0ef000a8808be8261d13991cc6e12b9758b", "target_hash": "4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a", "exact_match": 0.0} As you can see, the few shot example starts with '问题:' which uses Chinese colon and it's the original format of mgsm zh. However, the query starts with '问题: ' which is a English colon and a space. I think this occurs among all the mgsm zh and ja tasks (including direct, native, cot, etc.). This could be easily fixed by modifying the utils.py to Chinese colon. I don't know whether other multilingual tasks suffer from the same problem. |
great catch! Would you be willing to make a PR? |
Yes, I have made a PR at #2587 . |
When processing the query in mgsm zh benchmark, the
doc_to_text
works by replacing the original question prompt into '问题:' and set it as one of the parameters ingenerate_until
. This is correct.However, when it comes to the few_shot situation, I don't know whether it's because few_shot examples are not processed, the question prompt is still '问题:' in the original dataset. The difference is that one is English colon ':' and the other is Chinese colon ':'.
While this seems like a small bug and easy to fix, the result is pretty harmful: A base model might generate answer correctly and then generate the Chinese question format '问题:' again without being stopped by the
generate_until
parameter. This will lead to bad result.The text was updated successfully, but these errors were encountered: