-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Hi SB3 team,
Thank you for your great work on this library!
I have a question regarding the ActorCriticPolicy architecture . I noticed that there are three separate FlattenExtractor instances: features_extractor, pi_features_extractor, and vf_features_extractor.
ActorCriticPolicy(
(features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(pi_features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(vf_features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(mlp_extractor): MlpExtractor(
(policy_net): Sequential(
(0): Linear(in_features=4, out_features=64, bias=True)
(1): Tanh()
(2): Linear(in_features=64, out_features=64, bias=True)
(3): Tanh()
)
(value_net): Sequential(
(0): Linear(in_features=4, out_features=64, bias=True)
(1): Tanh()
(2): Linear(in_features=64, out_features=64, bias=True)
(3): Tanh()
)
)
(action_net): Linear(in_features=64, out_features=2, bias=True)
(value_net): Linear(in_features=64, out_features=1, bias=True)
)
Could you please clarify what the purpose of each of these is? Specifically:
Why are there three flatten extractors instead of just one?
What is the difference between features_extractor, pi_features_extractor, and vf_features_extractor in this context?
My guess is that features_extractor is used for shared feature extraction, while pi_features_extractor and vf_features_extractor are used for separate, dedicated extraction paths for the policy and value networks, respectively. I also assume that these cannot be active at the same time — that is, when the shared extractor is used, the other two are inactive, and vice versa.
Is this understanding correct?
As a suggestion: it would be helpful if the documentation included a few concrete examples showing the actual output structure of the networks. That would make it easier to understand how the components interact.
Thanks again!