- [09/08/2024]: All the experimental data is public on Google Drive.
- [05/15/2024]: We are accepted by ACL-2024, you can find our final version in arxiv.
- [01/16/2024]: We are rejected by ICLR-2024 with scores 8666(ranked top 13%-16%).
- [10/12/2023]: Upload the dataset
SocraticChat
in hugging face. - [10/10/2023]: Update the tech report v4.
- [10/08/2023]: The user simulator
UserGPT
, datasetRealChat
and the respondent modelReaLM
are renamed toSocratic
,SocraticChat
, andPlatoLM
by Benyou Wang, the provider of 4 x A100s. - [08/21/2023]: PlatoLM-7b Rank #1 on AlpacaEval benchmark among 7B scale, achieving 81.94% win rates against text-davinci-003 (has entered into the official benchmark).
- [08/21/2023]: PlatoLM-7b Rank #1 on MT-Bench benchmark among 7B scale (hasn't entered into the official benchmark yet).
- [08/21/2023]: Release the model weights.
- [08/21/2023]: Release the tech report v1.
Welcome to our realm🤗
We propose a new paradigm for training a user simulator.
After applying this paradigm to ShareGPT and LLaMA-7B, a novel user simulator, Socratic
, emerged. Through iterative interactions between Socratic and gpt-3.5-turbo, a multi-round conversation dataset named SocraticChat
was generated. Leveraging this dataset for fine-tuning LLAMA-7B-2 resulted in the PlatoLM
model, which exhibits superior performance.
With fewer samples(50.7K) distilled from gpt-3.5, shorter context length(2048), and smaller model scale(7B), we even beat GPT 3.5 in Alpaca-Eval benchmark.
The key to our idea is to flip the chessboard
.
We just mask the questions of real users
and accordingly, only calculate their loss
for the purpose of modifying the learning objective
.
In addition, we use a dyadic prompt template
to instruct our backbone.
The main difference between us and other research is shown below.
The pipeline can be analogous to Socratic teaching
, which means teaching students via questioning. We argue that after learning the real human's high-quality instructions based on the knowledgeable llama backbone, more human-like LLMs will master the sophisticated teaching ability.
Therefore, we named the query model Socratic
, which means the follower of Socrates. Likewise, we labeled the dataset as SocraticChat
, and the resulting model was dubbed PlatoLM
.
Experiments show that a more human-like questioning pattern in dynamic multi-round conversations can teach the response model better compared to static role-playing, which can be attributed to the natural and rich topic structures of the questioning pattern from humans
in human-machine dialogue where they hold topic dominance
.
The typical samples
for Socratic Dialogues and our dataset SocraticChat are shown below.
# To fine-tune Socratic
cd model/sft_socratic
bash scripts/sft_7b.sh
# To fine-tune PlatoLM
cd model/sft_platolm
bash scripts/sft_7b.sh
# To infer PlatoLM
python -m model.sft_platolm.source.deploy.cli --model FreedomIntelligence/PlatoLM-7b
# To infer Socratic
# The model's weights of Socratic has not been published yet.
python -m model.sft_socratic.source.deploy.cli --model balabala
We are aware that our works are inspired by the following works, including but not limited to
- LLaMA: https://huggingface.co/meta-llama
- Self-instruct: https://github.com/yizhongw/self-instruct
- LLMZoo: https://github.com/FreedomIntelligence/LLMZoo
Without these, nothing could happen in this repository.
@inproceedings{kong2024platolm,
title={PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator},
author={Kong, Chuyi and Fan, Yaxin and Wan, Xiang and Jiang, Feng and Wang, Benyou},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={7841--7863},
year={2024}
}
We are from the School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHKSZ), and the Shenzhen Research Institute of Big Data (SRIBD).