Skip to content

FreedomIntelligence/PlatoLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

✨ Latest News

  • [09/08/2024]: All the experimental data is public on Google Drive.
  • [05/15/2024]: We are accepted by ACL-2024, you can find our final version in arxiv.
  • [01/16/2024]: We are rejected by ICLR-2024 with scores 8666(ranked top 13%-16%).
  • [10/12/2023]: Upload the dataset SocraticChat in hugging face.
  • [10/10/2023]: Update the tech report v4.
  • [10/08/2023]: The user simulator UserGPT, dataset RealChat and the respondent model ReaLM are renamed to Socratic, SocraticChat, and PlatoLM by Benyou Wang, the provider of 4 x A100s.
  • [08/21/2023]: PlatoLM-7b Rank #1 on AlpacaEval benchmark among 7B scale, achieving 81.94% win rates against text-davinci-003 (has entered into the official benchmark).
  • [08/21/2023]: PlatoLM-7b Rank #1 on MT-Bench benchmark among 7B scale (hasn't entered into the official benchmark yet).
  • [08/21/2023]: Release the model weights.
  • [08/21/2023]: Release the tech report v1.

⚡ Introduction

Welcome to our realm🤗

We propose a new paradigm for training a user simulator.

After applying this paradigm to ShareGPT and LLaMA-7B, a novel user simulator, Socratic, emerged. Through iterative interactions between Socratic and gpt-3.5-turbo, a multi-round conversation dataset named SocraticChat was generated. Leveraging this dataset for fine-tuning LLAMA-7B-2 resulted in the PlatoLM model, which exhibits superior performance.

With fewer samples(50.7K) distilled from gpt-3.5, shorter context length(2048), and smaller model scale(7B), we even beat GPT 3.5 in Alpaca-Eval benchmark.

cool cool

📖 Methodology

The key to our idea is to flip the chessboard.

We just mask the questions of real users and accordingly, only calculate their loss for the purpose of modifying the learning objective. In addition, we use a dyadic prompt template to instruct our backbone.

The main difference between us and other research is shown below. pipeline

The pipeline can be analogous to Socratic teaching, which means teaching students via questioning. We argue that after learning the real human's high-quality instructions based on the knowledgeable llama backbone, more human-like LLMs will master the sophisticated teaching ability. Therefore, we named the query model Socratic, which means the follower of Socrates. Likewise, we labeled the dataset as SocraticChat, and the resulting model was dubbed PlatoLM.

analogy

Experiments show that a more human-like questioning pattern in dynamic multi-round conversations can teach the response model better compared to static role-playing, which can be attributed to the natural and rich topic structures of the questioning pattern from humans in human-machine dialogue where they hold topic dominance.

📄 Case Study

The typical samples for Socratic Dialogues and our dataset SocraticChat are shown below. sample2

🚀 Training

# To fine-tune Socratic
cd model/sft_socratic
bash scripts/sft_7b.sh 

# To fine-tune PlatoLM
cd model/sft_platolm
bash scripts/sft_7b.sh 

🧐 Inferencing

# To infer PlatoLM
python -m model.sft_platolm.source.deploy.cli --model FreedomIntelligence/PlatoLM-7b

# To infer Socratic
# The model's weights of Socratic has not been published yet. 
python -m model.sft_socratic.source.deploy.cli --model balabala

🎉 Acknowledgement

We are aware that our works are inspired by the following works, including but not limited to

Without these, nothing could happen in this repository.

💭 Citation

@inproceedings{kong2024platolm,
  title={PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator},
  author={Kong, Chuyi and Fan, Yaxin and Wan, Xiang and Jiang, Feng and Wang, Benyou},
  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={7841--7863},
  year={2024}
}

We are from the School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHKSZ), and the Shenzhen Research Institute of Big Data (SRIBD).

About

A trainable user simulator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •