- Linux
- Python 3.10.12
- PyTorch 2.0.0
- CUDA 12.0
- NVADIA RTX4090
Clone this repo.
git clone
cd paper-source-trace
Please install dependencies by
pip install -r requirements.txt
The dataset can be downloaded from BaiduPan with password bft3, Aliyun or DropBox. The paper XML files are generated by Grobid APIs from paper pdfs.
--paper-source-trace-main
--scibert_scivocab_uncased
--out
--kddcup
--scibert_eval1-434
--scibert_eval1-429
--scibert_eval1-423
--data
--pst
--paper-xml(load competition dataset)
# Three models were trained with different parameters
python bert-eval-434.py
python bert-eval-429.py
python bert-eval-423.py
# output at out/kddcup/ (model weight and result)
# inference
注释掉其他训练函数,配置好权重和测试文件执行gen_kddcup_valid_submission_bert函数即可
if __name__ == "__main__":
seed=2023
setup_seed(seed)
#prepare_bert_input()
#train(model_name="scibert")
gen_kddcup_valid_submission_bert(model_name="scibert")
# Fusion of model results
python rong.py
#output at out/kddcup/scibert_rong/
here are three model weight and pretrain weight: 2024-kddcup-pst-rank5-chinesegpt https://pan.baidu.com/s/1gIt6ZzZGOTRW6VeFcDRu6w password:eyla 权重中包含了推理的结果
We do further experiments based on the baseline code, the main method is to process the training data and parameter tuning, including the addition of the title of the cited paper, which can be found in the prepare_bert_input function in bert_eval-434.py.
Method | MAP |
---|---|
model1 | 0.434 |
model2 | 0.429 |
model3 | 0.423 |
ensemble | 0.449 |
If you have any questions, please contact me. Email:[email protected]