Skip to content

xgl0626/paper-source-trace-chinesegpt-rank4

Repository files navigation

paper-source-trace-chinesegpt-rank4

Prerequisites

  • Linux
  • Python 3.10.12
  • PyTorch 2.0.0
  • CUDA 12.0
  • NVADIA RTX4090

Getting Started

Installation

Clone this repo.

git clone 
cd paper-source-trace

Please install dependencies by

pip install -r requirements.txt

PST Dataset

The dataset can be downloaded from BaiduPan with password bft3, Aliyun or DropBox. The paper XML files are generated by Grobid APIs from paper pdfs.

Directory structure

--paper-source-trace-main
	--scibert_scivocab_uncased
	--out
		--kddcup
			--scibert_eval1-434
			--scibert_eval1-429
			--scibert_eval1-423
	--data
    		--pst
    			--paper-xml(load competition dataset)

Train&Inference

# Three models were trained with different parameters
python bert-eval-434.py
python bert-eval-429.py
python bert-eval-423.py
# output at out/kddcup/ (model weight and result)

# inference
注释掉其他训练函数,配置好权重和测试文件执行gen_kddcup_valid_submission_bert函数即可
if __name__ == "__main__":
    seed=2023
    setup_seed(seed)
    #prepare_bert_input()
    #train(model_name="scibert")
    gen_kddcup_valid_submission_bert(model_name="scibert")

# Fusion of model results
python rong.py
#output at out/kddcup/scibert_rong/

Model weight

here are three model weight and pretrain weight: 2024-kddcup-pst-rank5-chinesegpt https://pan.baidu.com/s/1gIt6ZzZGOTRW6VeFcDRu6w password:eyla 权重中包含了推理的结果

Method

We do further experiments based on the baseline code, the main method is to process the training data and parameter tuning, including the addition of the title of the cited paper, which can be found in the prepare_bert_input function in bert_eval-434.py.

Results on Test Set

Method MAP
model1 0.434
model2 0.429
model3 0.423
ensemble 0.449

If you have any questions, please contact me. Email:[email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages