Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Clamping actions between a range #2630

Answered by vmoens
TRPrasanna asked this question in Q&A
Discussion options

Hi, apologies if this has been addressed before. I am setting up a custom environment and have used the following

            {"action": Bounded(
                low=torch.tensor(-0.1, device=self.device),
                high=torch.tensor(0.1, device=self.device),
                shape=(1,),
                device=self.device
            )},
            batch_size=torch.Size([])
        )

in the init method of my custom environment. However, the actor (in my case, for PPO) picks values beyond the low and high values that I have set above. What would be the correct way to clamp the actions between the same range as set in the action key? Let me know if additional context is required.

Edit: I passed safe = True when creating the ProbabilisticActor object and it seems to work, but I am not sure if it is the intended way.

    policy = ProbabilisticActor(
        module=actor_module,
        spec=env.action_spec,
        in_keys=["loc", "scale"],
        distribution_class=TanhNormal,
        return_log_prob=True,
        safe = True
    ).to(device)
You must be logged in to vote

Yes safe is the way to go. That will clamp your actions.
I know it's something people do but I would advise against clamping actions, especially in policy optimization setting: the assumption is that when you do your importance weight, the two log-probabilities are the log-prob given the new distribution minus the log-prob given the original. But since your distribution is effectively truncated, the real log-prob isn't the one you're computing (integrating the prob over the space of actions that can be samples will be < 1 when it should be = 1).

In practice I usually advise people to use TanhNormal or a TruncatedNormal distribution (you'll find both in torchrl.modules).
https://pytorch.or…

Replies: 1 comment · 1 reply

Comment options

Yes safe is the way to go. That will clamp your actions.
I know it's something people do but I would advise against clamping actions, especially in policy optimization setting: the assumption is that when you do your importance weight, the two log-probabilities are the log-prob given the new distribution minus the log-prob given the original. But since your distribution is effectively truncated, the real log-prob isn't the one you're computing (integrating the prob over the space of actions that can be samples will be < 1 when it should be = 1).

In practice I usually advise people to use TanhNormal or a TruncatedNormal distribution (you'll find both in torchrl.modules).
https://pytorch.org/rl/stable/reference/generated/torchrl.modules.TanhNormal.html
https://pytorch.org/rl/stable/reference/generated/torchrl.modules.TruncatedNormal.html

That being said I also acknowledge that a lot of what is done in RL (and ML in general) is done "because it works" rather than because it's motivated theorically :)

You must be logged in to vote
1 reply
@TRPrasanna
Comment options

Thanks. For now, I think I'll use a scaled TanhNormal distribution:

    class ScaledTanhNormal(TanhNormal):
        def __init__(self, loc, scale):
            super().__init__(loc, scale)
            self.output_scale = 0.1  # Scale factor
        
        def rsample(self, sample_shape=torch.Size()):
            x = super().rsample(sample_shape)
            return x * self.output_scale
    
    policy = ProbabilisticActor(
        module=actor_module,
        spec=env.action_spec,
        in_keys=["loc", "scale"],
        distribution_class=ScaledTanhNormal,
        return_log_prob=True,
        #safe = True
    ).to(device)

Edit: I realize my code above is wrong because of log probabilities.

Answer selected by TRPrasanna
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🙏
Q&A
Labels
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.