dpo_trainer gather metrics across ranks before logging #2474

zhc7 · 2024-12-13T17:29:23Z

according to #2468

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

according to huggingface#2468

qgallouedec · 2024-12-13T17:39:39Z

trl/trainer/dpo_trainer.py

@@ -1424,7 +1424,11 @@ def log(self, logs: dict[str, float], start_time: Optional[float] = None) -> Non
        train_eval = "train" if "loss" in logs else "eval"
        # Add averaged stored metrics to logs
        for key, metrics in self._stored_metrics[train_eval].items():
-            logs[key] = torch.tensor(metrics).mean().item()
+            if isinstance(metrics[0], torch.Tensor):
+                gathered = self._nested_gather([m.cuda() for m in metrics])


do you need .cuda()?

maybe self.accelerator.gather(metrics).mean().item() would be simpler?

do you need .cuda()?

metrics are moved cpu before, some backends (e.g. nccl) does not support gathering tensors on cpu. but I admit .cuda here loses some genrality. maybe .to(self.accelerator.device) is better?

maybe self.accelerator.gather(metrics).mean().item() would be simpler?

I agree self.accelerator.gather is better. but metrics in the loop is a list[torch.Tensor] or list[float], so gather actually returns a list[torch.Tensor]. So I think I should change it into:

if isinstance(metrics[0], torch.Tensor): gathered = self.accelerator.gather([m.to(self.accelerator.device) for m in metrics]) metrics = [g.mean() for g in gathered] meaned = torch.tensor(metrics).mean() logs[key] = meaned.item()

I know creating a new tensor on metrics seems a little weird, but that is how it was originally written. I don't why but I don't why to break anything so I left it there.

HuggingFaceDocBuilderDev · 2024-12-13T17:43:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dpo_trainer gather metrics across ranks before logging

8febff8

according to huggingface#2468

qgallouedec reviewed Dec 13, 2024

View reviewed changes

Merge branch 'main' into patch-2

2e90890

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpo_trainer gather metrics across ranks before logging #2474

dpo_trainer gather metrics across ranks before logging #2474

zhc7 commented Dec 13, 2024 •

edited by qgallouedec

Loading

qgallouedec Dec 13, 2024

qgallouedec Dec 13, 2024

zhc7 Dec 14, 2024

zhc7 Dec 14, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 13, 2024

dpo_trainer gather metrics across ranks before logging #2474

Are you sure you want to change the base?

dpo_trainer gather metrics across ranks before logging #2474

Conversation

zhc7 commented Dec 13, 2024 • edited by qgallouedec Loading

What does this PR do?

Before submitting

Who can review?

qgallouedec Dec 13, 2024

Choose a reason for hiding this comment

qgallouedec Dec 13, 2024

Choose a reason for hiding this comment

zhc7 Dec 14, 2024

Choose a reason for hiding this comment

zhc7 Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 13, 2024

zhc7 commented Dec 13, 2024 •

edited by qgallouedec

Loading

zhc7 Dec 14, 2024 •

edited

Loading