Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Direct Q-Function Optimization ✨ enhancement New feature or request
#2526 opened Dec 28, 2024 by catherinelee274
Integrate OREO into TRL and HF ✨ enhancement New feature or request
#2525 opened Dec 28, 2024 by August-murr
3 tasks done
[question] best way to have my own reward model which is backed by rules 🏋 PPO Related to PPO ❓ question Seeking clarification or more information
#2518 opened Dec 24, 2024 by yananchen1989
Soft Actor-Critic (SAC) Trainer ✨ enhancement New feature or request
#2517 opened Dec 23, 2024 by AMindToThink
3 tasks
RLOO trainer epochs/steps/episodes calculations seems not to be working properly 🐛 bug Something isn't working 🏋 RLOO Related to RLOO
#2515 opened Dec 23, 2024 by dawidm
7 of 9 tasks
Checkpointing is failing with SFTTrainer PEFT LoRA on DeepSpeed Zero-3 🐛 bug Something isn't working ⚡ PEFT Related to PEFT 🏋 SFT Related to SFT
#2514 opened Dec 21, 2024 by SwayamInSync
7 of 9 tasks
DDPO checkpoint ú· 🐛 bug Something isn't working 🏋 DPPO Related to DDPO 🙋 help from community wanted Open invitation for community members to contribute ⏳ needs more info Additional information or clarification is required to proceed
#2505 opened Dec 20, 2024 by nguyenhoa-uit
5 of 9 tasks
Spectrum training support ✨ enhancement New feature or request 🏋 SFT Related to SFT
#2504 opened Dec 19, 2024 by ggbetz
[bug] objective/entropy < 0 when using rlootrainer and ppotrainer 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
#2496 opened Dec 17, 2024 by macheng6
[Tracking issue] Integrate native liger-kernel losses ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity
#2495 opened Dec 17, 2024 by qgallouedec
5 tasks
DeepSpeed with trl 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed 🏋 DPO Related to DPO ⏳ needs more info Additional information or clarification is required to proceed
#2490 opened Dec 16, 2024 by sagie-dekel
7 of 9 tasks
RewardConfig's max_length argument docstring should indicate that it filters out dataset, rather than truncating it 📚 documentation Improvements or additions to documentation 👶 good first issue Good for newcomers 🙋 help from community wanted Open invitation for community members to contribute 🏋 Reward Related to Reward modelling
#2488 opened Dec 16, 2024 by Kallinteris-Andreas
Trainer forces the use of a specific collator 🏋 GKD Related to GKD ❓ question Seeking clarification or more information
#2481 opened Dec 14, 2024 by hteague-qti
KeyError in DPO Trainer, evaluation_loop 🐛 bug Something isn't working 🏋 DPO Related to DPO
#2473 opened Dec 13, 2024 by qingjianbuyi
7 of 9 tasks
A question about rlootrainer 🙋 help from community wanted Open invitation for community members to contribute ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
#2472 opened Dec 13, 2024 by macheng6
1 of 3 tasks
Provide Descriptions (READMEs) for trl-lib/dataset 🗃️ data Related to data 📚 documentation Improvements or additions to documentation ✨ enhancement New feature or request 👶 good first issue Good for newcomers 🙋 help from community wanted Open invitation for community members to contribute
#2470 opened Dec 13, 2024 by Kallinteris-Andreas
Packing in DPOTrainer 🏋 DPO Related to DPO ✨ enhancement New feature or request
#2469 opened Dec 13, 2024 by zhc7
DPOTrainer log metrics are not gathered and meaned across ranks 🐛 bug Something isn't working 🏋 DPO Related to DPO
#2468 opened Dec 13, 2024 by zhc7
ProTip! Updated in the last three days: updated:>2024-12-25.