Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting sac to work --- WIP #599

Open
wants to merge 148 commits into
base: user/michel-aractingi/2024-11-27-port-hil-serl
Choose a base branch
from

Conversation

Ke-Wang1017
Copy link

@Ke-Wang1017 Ke-Wang1017 commented Dec 26, 2024

What this does

Try to get the sac running without errors. Several places made to improve:

  • Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs.
  • Cleaned up code by removing unnecessary comments and improving readability.
  • Separated encoders for critic and actor in SACPolicy to enhance model performance.
  • Updated action selection method to use distribution mode for better inference.
  • Refactored critic forward pass to streamline Q-value calculations.
  • Improved temperature loss calculation and added comments for clarity.
  • Changed YAML configuration to switch observation type for keypoints evaluation.
  • -Modified action sampling in SACPolicy to use reparameterized sampling, ensuring better gradient flow and log probability calculations.
  • Cleaned up log probability calculations in TanhMultivariateNormalDiag for clarity and efficiency.

How it was tested

python scripts/train.py policy=sac_pusht_keypoints env=pusht +dataset=lerobot/pusht_keypoints

aliberts and others added 30 commits November 25, 2024 12:44
Co-authored-by: Daniel Ritchie <[email protected]>
Co-authored-by: resolver101757 <[email protected]>
Co-authored-by: Jannik Grothusen <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>
…ing logic

- Added `num_subsample_critics`, `critic_target_update_weight`, and `utd_ratio` to SACConfig.
- Implemented target entropy calculation in SACPolicy if not provided.
- Introduced subsampling of critics to prevent overfitting during updates.
- Updated temperature loss calculation to use the new target entropy.
- Added comments for future UTD update implementation.

These changes improve the flexibility and performance of the SAC implementation.
- Increased `latent_dim` from 50 to 128 for improved representation.
- Separated encoders for critic and actor in `SACPolicy` to enhance model performance.
- Updated action selection method to use distribution mode for better inference.
- Refactored critic forward pass to streamline Q-value calculations.
- Improved temperature loss calculation and added comments for clarity.
- Updated YAML configuration to switch observation type for keypoints evaluation.

These changes aim to enhance the flexibility, performance, and clarity of the SAC implementation.
…n handling

- Updated action selection to use distribution sampling and log probabilities for better stochastic behavior.
- Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs.
- Cleaned up code by removing unnecessary comments and improving readability.

These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference.
- Updated standard deviation parameterization in SACConfig to 'softplus' with defined min and max values for improved stability.
- Modified action sampling in SACPolicy to use reparameterized sampling, ensuring better gradient flow and log probability calculations.
- Cleaned up log probability calculations in TanhMultivariateNormalDiag for clarity and efficiency.
- Increased evaluation frequency in YAML configuration to 50000 for more efficient training cycles.

These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants