timecast.optim package¶
Submodules¶
timecast.optim.core module¶
Module contents¶
timecast.optim
-
class
timecast.optim.
Adagrad
(learning_rate: float = None, eps=1e-08)[source]¶ Bases:
flax.optim.base.OptimizerDef
Adagrad optimizer
-
class
timecast.optim.
Adam
(learning_rate=None, beta1=0.9, beta2=0.999, eps=1e-08, weight_decay=0.0)[source]¶ Bases:
flax.optim.base.OptimizerDef
Adam optimizer.
-
apply_param_gradient
(step, hyper_params, param, state, grad)[source]¶ Apply a gradient for a single parameter.
- Parameters
step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.
- Returns
A tuple containing the new parameter and the new state.
-
-
class
timecast.optim.
GradientDescent
(learning_rate=None)[source]¶ Bases:
flax.optim.base.OptimizerDef
Gradient descent optimizer.
-
apply_param_gradient
(step, hyper_params, param, state, grad)[source]¶ Apply a gradient for a single parameter.
- Parameters
step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.
- Returns
A tuple containing the new parameter and the new state.
-
-
class
timecast.optim.
LAMB
(learning_rate=None, beta1=0.9, beta2=0.999, weight_decay=0, eps=1e-06)[source]¶ Bases:
flax.optim.base.OptimizerDef
Layerwise adaptive moments for batch (LAMB) optimizer.
See https://arxiv.org/abs/1904.00962
-
apply_param_gradient
(step, hyper_params, param, state, grad)[source]¶ Apply a gradient for a single parameter.
- Parameters
step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.
- Returns
A tuple containing the new parameter and the new state.
-
-
class
timecast.optim.
LARS
(learning_rate=None, beta=0.9, weight_decay=0, trust_coefficient=0.001, eps=0, nesterov=False)[source]¶ Bases:
flax.optim.base.OptimizerDef
Layerwise adaptive rate scaling (LARS) optimizer.
See https://arxiv.org/abs/1708.03888
-
apply_param_gradient
(step, hyper_params, param, state, grad)[source]¶ Apply a gradient for a single parameter.
- Parameters
step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.
- Returns
A tuple containing the new parameter and the new state.
-
-
class
timecast.optim.
Momentum
(learning_rate=None, beta=0.9, weight_decay=0, nesterov=False)[source]¶ Bases:
flax.optim.base.OptimizerDef
Momentum optimizer.
-
apply_param_gradient
(step, hyper_params, param, state, grad)[source]¶ Apply a gradient for a single parameter.
- Parameters
step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.
- Returns
A tuple containing the new parameter and the new state.
-
-
class
timecast.optim.
MultiplicativeWeights
(eta: float = None)[source]¶ Bases:
flax.optim.base.OptimizerDef
Multiplicative weights
-
class
timecast.optim.
ProjectedSGD
(learning_rate: float = None, projection_threshold: float = None)[source]¶ Bases:
flax.optim.base.OptimizerDef
Gradient descent optimizer with projections.