Skip to content

Confusion about optimizer #2

@arvieFrydenlund

Description

@arvieFrydenlund

Hi, thanks for the code release.

I'm confused about how which optimizer is being selected for which parameters. For the hyperbolic parameters, it should be the hyperbolic optimizer? However, it looks like the choice in optimizer is dependent on the parameter being 'ManifoldParameter', but I feel like all parameters outside the manifold values are not actually instances of ManifoldParameter?

I'm not running your code directly, so it may be that I have missed somewhere where these parameters are converted to ManifoldParameter or something.

Thanks.

This is in my code:

        euc_params = [p for p in params
                      if p.requires_grad and not isinstance(p, ManifoldParameter)]  # Euclidean parameters
        hyp_params = [p for p in params
                      if p.requires_grad and isinstance(p, ManifoldParameter)]  # Hyperbolic parameters

        logger.info("Initializing HyperbolicOptimizer")
        logger.info(f"Number of Euclidean parameters: {len(euc_params)}")
        logger.info(f"Number of Hyperbolic parameters: {len(hyp_params)}")
        logger.info(f"Euclidean optimizer: {euc_optimizer_type}, lr: {euc_lr}, weight_decay: {euc_weight_decay}")
        logger.info(f"Hyperbolic optimizer: {hyp_optimizer_type}, lr: {hyp_lr}, weight_decay: {hyp_weight_decay}")

Which gives

2025-09-24 15:46:13 | INFO | fairseq_plugins.HyperbolicTransformer.hypercore.optimizers.optimizer | Initializing HyperbolicOptimizer
2025-09-24 15:46:13 | INFO | fairseq_plugins.HyperbolicTransformer.hypercore.optimizers.optimizer | Number of Euclidean parameters: 248
2025-09-24 15:46:13 | INFO | fairseq_plugins.HyperbolicTransformer.hypercore.optimizers.optimizer | Number of Hyperbolic parameters: 2
2025-09-24 15:46:13 | INFO | fairseq_plugins.HyperbolicTransformer.hypercore.optimizers.optimizer | Euclidean optimizer: adam, lr: 0.0005, weight_decay: 0.05
2025-09-24 15:46:13 | INFO | fairseq_plugins.HyperbolicTransformer.hypercore.optimizers.optimizer | Hyperbolic optimizer: radam, lr: 0.01, weight_decay: 0.0

With these being the only ones

decoder.embed_tokens.poisitional_encoding <class 'geoopt.tensor.ManifoldParameter'> True
decoder.embed_tokens.embedding <class 'geoopt.tensor.ManifoldParameter'> True

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions