-
Notifications
You must be signed in to change notification settings - Fork 944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNaD reward transformation #1075
Comments
@perolat, @bartdevylder: any ideas? |
Hi, |
Hi Thank you for your reply. I understand this already. I want to understand why the merged_log_policy is multiplied by the policy in the code when this is not communicated in the paper. |
Hi, |
@perolat any updates on this? |
@spktrm Do you know the reason? Thanks. |
Based on formulae from the paper, the reward transformation is given by adding the log policy ratio
However, the code contains an entropy term instead.
https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L422
Which one is it?
The text was updated successfully, but these errors were encountered: