RNaD reward transformation #1075

spktrm · 2023-05-25T09:58:10Z

Based on formulae from the paper, the reward transformation is given by adding the log policy ratio

However, the code contains an entropy term instead.

https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L422

Which one is it?

lanctot · 2023-06-01T19:16:14Z

@perolat, @bartdevylder: any ideas?

bartdevylder · 2023-06-13T13:14:45Z

Hi,
Thanks for your question. The merged_log_policy term in the line you posted actually already contains the log policy ratio. It is defined here: https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L801
taking into account the interpolation between the two regularization policies.

spktrm · 2023-06-13T13:28:55Z

Hi

Thank you for your reply. I understand this already. I want to understand why the merged_log_policy is multiplied by the policy in the code when this is not communicated in the paper.

bartdevylder · 2023-06-14T07:08:56Z

Hi,
ok now I see your point. The eta_log_policy variable corresponds to the regularisation described in the paper, but the meaning of eta_reg_entropy is not so clear. @perolat will look into this to clarify

spktrm · 2023-06-23T23:01:16Z

@perolat any updates on this?

sbl1996 · 2024-04-15T12:31:35Z

@spktrm Do you know the reason? Thanks.

spktrm · 2024-04-15T23:39:16Z

@spktrm Do you know the reason? Thanks.

Nope, unfortunately. Waiting for @perolat or related to clarify.

spktrm mentioned this issue Dec 13, 2023

RNaD: Possible Error in calculation of Neurd Loss #1156

Closed

lanctot assigned perolat Dec 15, 2023

spktrm mentioned this issue Jan 30, 2024

RNaD off policy case #1109

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNaD reward transformation #1075

RNaD reward transformation #1075

spktrm commented May 25, 2023 •

edited

Loading

lanctot commented Jun 1, 2023

bartdevylder commented Jun 13, 2023

spktrm commented Jun 13, 2023 •

edited

Loading

bartdevylder commented Jun 14, 2023

spktrm commented Jun 23, 2023

sbl1996 commented Apr 15, 2024

spktrm commented Apr 15, 2024

RNaD reward transformation #1075

RNaD reward transformation #1075

Comments

spktrm commented May 25, 2023 • edited Loading

lanctot commented Jun 1, 2023

bartdevylder commented Jun 13, 2023

spktrm commented Jun 13, 2023 • edited Loading

bartdevylder commented Jun 14, 2023

spktrm commented Jun 23, 2023

sbl1996 commented Apr 15, 2024

spktrm commented Apr 15, 2024

spktrm commented May 25, 2023 •

edited

Loading

spktrm commented Jun 13, 2023 •

edited

Loading