You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that in /src/a3c.py line 271-277, self.network = LSTMPolicy(env.observation_space.shape, numaction, designHead)
is defined within the scope "local", and self.ap_network = StatePredictor(env.observation_space.shape, numaction, designHead, unsupType)
is defined within the scope "predictor" under the scope "local". I think (as I tested MNIST in a simple CNN) this indicates that the designHead weights used in both classes are different (even though designHead structures are the same) since they are under different scope.
In LstmPolicy class, the inputs are fed into the designHead and the outputs are fed into lstm for policy and value fcn prediction.
However in StatePredictor/StateActionPredictor class, the forward and inverse models are based on the designHead with different weights as I mentioned LstmPolicy and StatePredictor are within different scopes.
I was wondering here /src/a3c.py line 271-277, why LstmPolicy and StatePredictor are not under the same scope so their designHead would share weights. In other words, if they are using different weights, it seems that the forward and inverse models are trained regardless of the A3C policy and value function, while A3C policy/value fcn are affected by the forward loss as intrinsic reward.
Thank you,
Li
The text was updated successfully, but these errors were encountered:
Hello, thanks for your great work!
I noticed that in
/src/a3c.py
line 271-277,self.network = LSTMPolicy(env.observation_space.shape, numaction, designHead)
is defined within the scope "local", and
self.ap_network = StatePredictor(env.observation_space.shape, numaction, designHead, unsupType)
is defined within the scope "predictor" under the scope "local". I think (as I tested MNIST in a simple CNN) this indicates that the
designHead
weights used in both classes are different (even thoughdesignHead
structures are the same) since they are under different scope.In LstmPolicy class, the inputs are fed into the
designHead
and the outputs are fed into lstm for policy and value fcn prediction.However in StatePredictor/StateActionPredictor class, the forward and inverse models are based on the
designHead
with different weights as I mentionedLstmPolicy
andStatePredictor
are within different scopes.I was wondering here
/src/a3c.py
line 271-277, whyLstmPolicy
andStatePredictor
are not under the same scope so theirdesignHead
would share weights. In other words, if they are using different weights, it seems that the forward and inverse models are trained regardless of the A3C policy and value function, while A3C policy/value fcn are affected by the forward loss as intrinsic reward.Thank you,
Li
The text was updated successfully, but these errors were encountered: