optimizers

Extend optimizers in HAT.

optimizers

MemberSummary
legacy_nadam_ex.LegacyNadamExNadam optimizer.
lion.LionImplements Lion algorithm <https://arxiv.org/pdf/2302.06675.pdf>.
optim_param_wrap.custom_param_optimizerReturn optimizer with custom params setting.

API Reference

class hat.optimizers.legacy_nadam_ex.LegacyNadamEx(params: Dict, rescale_grad: float = 1.0, lr: float = 0.001, weight_decay: float = 5e-05, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-08, schedule_decay: float = 0.004, wd_type: int = 3, fused: bool = False)

Nadam optimizer.

This optimizer compute wd in the wrong way, but get far more better : performance.

  • Parameters:
    • params – Parameters.
    • rescale_grad – Coefficient of rescale grad.
    • lr – Learning rate.
    • weight_decay (float , optional) – _description_. Defaults to 5e-5.
    • beta1 – Exponential decay rate for the first moment estimates.
    • beta2 – Exponential decay rate for the second moment estimates.
    • epsilon – Epsilon, small value to avoid division by 0.
    • schedule_decay – Exponential decay rate for the momentum schedule.
    • wd_type – The way to compute weight decay, possible values are {1, 2, 3}. Defaults to 3.
    • fused – Whether to use horizon multi_tensor_legacynadamex, which can deal with multi param, and more faster.

step(closure: Callable = None)

Perform a single optimization step.

  • Parameters: closure (callable , optional) – A closure that reevaluates the model and returns the loss.

class hat.optimizers.lion.Lion(params: Dict, lr: float = 0.0001, betas: Tuple = (0.9, 0.99), weight_decay: float = 0.0)

Implements Lion algorithm <https://arxiv.org/pdf/2302.06675.pdf>.

  • Parameters:
    • params – iterable of parameters to optimize or dicts defining parameter groups
    • lr – learning rate (default: 1e-4)
    • betas – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.99))
    • weight_decay – weight decay coefficient (default: 0)

Note docs: : https://horizonrobotics.feishu.cn/docx/AsStdmRXIoeSyZxk9OPccxvnn0u

step(closure=None)

Perform a single optimization step.

  • Parameters: closure (callable , optional) – A closure that reevaluates the model and returns the loss.
  • Returns: the loss.

hat.optimizers.optim_param_wrap.custom_param_optimizer(optim_cls: Type[Optimizer], optim_cfgs: Dict, custom_param_mapper: Dict[str | Type[Module] | Tuple, Dict[str, float]])

Return optimizer with custom params setting.

  • Parameters:
    • optim_cls – The wrapped optimizer class, e.g., torch.optim.SGD.
    • model – The model instance that will be trained.
    • optim_cfgs – The configuration for the basic optimizer, e.g., lr.
    • custom_param_mapper – A dictionary for custom mapping between model parameters and optimizer parameters.

The custom_param_mapper has the following key characteristics:

Key Matching:

  1. Class of torch.nn.Module: The keys can directly match the corresponding parameters of the model.
  2. Predefined types: Keys can be chosen from predefined types. Supported types include [“norm_types”, ].
  3. String match: Keys can be matched based on param_names.
  4. Tuple of previous 3 kinds of keys.

Value Setting: : Optimizer parameters can be set, e.g., {“weight_decay”: 1e-4, “lr”: 0.01}.

Example:

>>> custom_param_mapper = { ... "norm_types": {"weight_decay": 0}, ... nn.Conv2d: {"weight_decay": 1e-4, "lr": 0.01}, ... "bias": {"weight_decay": 1e-5, "lr": 0.1}, ... (nn.Conv2d, "bias"): {"weight_decay": 0,} ... }