RNaD: Possible Error in calculation of Neurd Loss #1156

spktrm · 2023-12-12T11:35:09Z

In this line of the RNaD algorithm

open_spiel/open_spiel/python/algorithms/rnad/rnad.py

Line 574 in 7c58b6c

logits = logit_pi - jnp.mean(

Should the line instead be this? This is so we only subtract the mean calculated from the valid logits.

logits = logit_pi - (jnp.sum(
        logit_pi * legal_actions, axis=-1, keepdims=True) / jnp.sum(legal_actions, axis=-1, keepdims=True))

As a result, should the line below be an average over actions rather than a sum?

open_spiel/open_spiel/python/algorithms/rnad/rnad.py

Line 579 in 7c58b6c

nerd_loss = jnp.sum(

i.e.

nerd_loss = jnp.sum(
        legal_actions *
        apply_force_with_threshold(logits, adv_pi, threshold, threshold_center),
        axis=-1) / jnp.sum(legal_actions, axis=-1)

This is particularly relevant in games where there is frequently a number of invalid actions.

lanctot · 2023-12-13T12:14:08Z

@perolat can you take a look?

lanctot · 2023-12-13T13:33:00Z

Hi @spktrm , I spoke to Julien.

He said you're correct about the first one, can you submit a PR?

The second one could go either way: it's just a matter of knowing what works. It is not clear whether one works better than the other and it might end up being similar behavior but require different hyper-parameters. Maybe you can try it and let us know?

spktrm · 2023-12-13T21:35:30Z

I have submitted a PR regarding the first point here: #1157, thank you for the opportunity to contribute :).

With regards to the second point, I will experiment further with the fix I am suggesting and let you know how it goes.

Meanwhile, is it possible to provide clarity on these other issues? Namely:

lanctot · 2023-12-13T21:45:43Z

Hi @spktrm,

Yeah I will make Julien aware of those (sorry, I thought they were resolved already).

I think it may be useful to also try contacting him directly by email, though... because I'm mostly just relaying messages from here to him and back :)

spktrm · 2023-12-14T05:38:27Z

Thank you. What is his best email?

lanctot · 2023-12-14T12:26:41Z

Thank you. What is his best email?

Still the same one from the Mastering Stratego paper.

lanctot · 2024-01-04T01:28:41Z

Fixed by #1157, which has now been merged into master.

spktrm mentioned this issue Dec 13, 2023

fix mean logit calculation in neurd loss for rnad #1157

Merged

lanctot added bug Something isn't working fixed This is fixed internally, and will be merged in the next github sync! labels Dec 15, 2023

lanctot closed this as completed Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNaD: Possible Error in calculation of Neurd Loss #1156

RNaD: Possible Error in calculation of Neurd Loss #1156

spktrm commented Dec 12, 2023

lanctot commented Dec 13, 2023

lanctot commented Dec 13, 2023

spktrm commented Dec 13, 2023

lanctot commented Dec 13, 2023

spktrm commented Dec 14, 2023

lanctot commented Dec 14, 2023

lanctot commented Jan 4, 2024

RNaD: Possible Error in calculation of Neurd Loss #1156

RNaD: Possible Error in calculation of Neurd Loss #1156

Comments

spktrm commented Dec 12, 2023

lanctot commented Dec 13, 2023

lanctot commented Dec 13, 2023

spktrm commented Dec 13, 2023

lanctot commented Dec 13, 2023

spktrm commented Dec 14, 2023

lanctot commented Dec 14, 2023

lanctot commented Jan 4, 2024