Clamping actions between a range #2630

Dec 4, 2024

TRPrasanna
Dec 4, 2024

Hi, apologies if this has been addressed before. I am setting up a custom environment and have used the following

            {"action": Bounded(
                low=torch.tensor(-0.1, device=self.device),
                high=torch.tensor(0.1, device=self.device),
                shape=(1,),
                device=self.device
            )},
            batch_size=torch.Size([])
        )

in the init method of my custom environment. However, the actor (in my case, for PPO) picks values beyond the low and high values that I have set above. What would be the correct way to clamp the actions between the same range as set in the action key? Let me know if additional context is required.

Edit: I passed safe = True when creating the ProbabilisticActor object and it seems to work, but I am not sure if it is the intended way.

    policy = ProbabilisticActor(
        module=actor_module,
        spec=env.action_spec,
        in_keys=["loc", "scale"],
        distribution_class=TanhNormal,
        return_log_prob=True,
        safe = True
    ).to(device)

Answered by vmoens

Dec 4, 2024

Yes safe is the way to go. That will clamp your actions.
I know it's something people do but I would advise against clamping actions, especially in policy optimization setting: the assumption is that when you do your importance weight, the two log-probabilities are the log-prob given the new distribution minus the log-prob given the original. But since your distribution is effectively truncated, the real log-prob isn't the one you're computing (integrating the prob over the space of actions that can be samples will be < 1 when it should be = 1).

In practice I usually advise people to use TanhNormal or a TruncatedNormal distribution (you'll find both in torchrl.modules).
https://pytorch.or…

View full answer

TRPrasanna · Dec 4, 2024

vmoens
Dec 4, 2024
Collaborator

Yes safe is the way to go. That will clamp your actions.
I know it's something people do but I would advise against clamping actions, especially in policy optimization setting: the assumption is that when you do your importance weight, the two log-probabilities are the log-prob given the new distribution minus the log-prob given the original. But since your distribution is effectively truncated, the real log-prob isn't the one you're computing (integrating the prob over the space of actions that can be samples will be < 1 when it should be = 1).

In practice I usually advise people to use TanhNormal or a TruncatedNormal distribution (you'll find both in torchrl.modules).
https://pytorch.org/rl/stable/reference/generated/torchrl.modules.TanhNormal.html
https://pytorch.org/rl/stable/reference/generated/torchrl.modules.TruncatedNormal.html

That being said I also acknowledge that a lot of what is done in RL (and ML in general) is done "because it works" rather than because it's motivated theorically :)

1 reply

TRPrasanna Dec 4, 2024
Author

Thanks. For now, I think I'll use a scaled TanhNormal distribution:

    class ScaledTanhNormal(TanhNormal):
        def __init__(self, loc, scale):
            super().__init__(loc, scale)
            self.output_scale = 0.1  # Scale factor
        
        def rsample(self, sample_shape=torch.Size()):
            x = super().rsample(sample_shape)
            return x * self.output_scale
    
    policy = ProbabilisticActor(
        module=actor_module,
        spec=env.action_spec,
        in_keys=["loc", "scale"],
        distribution_class=ScaledTanhNormal,
        return_log_prob=True,
        #safe = True
    ).to(device)

Edit: I realize my code above is wrong because of log probabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clamping actions between a range #2630

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment · 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Clamping actions between a range #2630

Uh oh!

Uh oh!

TRPrasanna Dec 4, 2024

Replies: 1 comment · 1 reply

Uh oh!

vmoens Dec 4, 2024 Collaborator

Uh oh!

Uh oh!

TRPrasanna Dec 4, 2024 Author

TRPrasanna
Dec 4, 2024

vmoens
Dec 4, 2024
Collaborator

TRPrasanna Dec 4, 2024
Author