timecast.optim package¶

Submodules¶

timecast.optim.core module¶

Module contents¶

timecast.optim

class timecast.optim.Adagrad(learning_rate: float = None, eps=1e-08)[source]¶

Bases: flax.optim.base.OptimizerDef

Adagrad optimizer

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶: Apply per-parameter gradients

init_param_state(param)[source]¶: Initialize parameter state

class timecast.optim.Adam(learning_rate=None, beta1=0.9, beta2=0.999, eps=1e-08, weight_decay=0.0)[source]¶

Bases: flax.optim.base.OptimizerDef

Adam optimizer.

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶

Apply a gradient for a single parameter.

Parameters

step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]¶

Initializes the state for a parameter.

Parameters: param – the parameter for which to initialize the state.
Returns: A named tuple containing the initial optimization state for the parameter.

class timecast.optim.GradientDescent(learning_rate=None)[source]¶

Bases: flax.optim.base.OptimizerDef

Gradient descent optimizer.

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶

Apply a gradient for a single parameter.

Parameters

step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]¶

Initializes the state for a parameter.

Parameters: param – the parameter for which to initialize the state.
Returns: A named tuple containing the initial optimization state for the parameter.

class timecast.optim.LAMB(learning_rate=None, beta1=0.9, beta2=0.999, weight_decay=0, eps=1e-06)[source]¶

Bases: flax.optim.base.OptimizerDef

Layerwise adaptive moments for batch (LAMB) optimizer.

See https://arxiv.org/abs/1904.00962

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶

Apply a gradient for a single parameter.

Parameters

step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]¶

Initializes the state for a parameter.

Parameters: param – the parameter for which to initialize the state.
Returns: A named tuple containing the initial optimization state for the parameter.

class timecast.optim.LARS(learning_rate=None, beta=0.9, weight_decay=0, trust_coefficient=0.001, eps=0, nesterov=False)[source]¶

Bases: flax.optim.base.OptimizerDef

Layerwise adaptive rate scaling (LARS) optimizer.

See https://arxiv.org/abs/1708.03888

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶

Apply a gradient for a single parameter.

Parameters

step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]¶

Initializes the state for a parameter.

Parameters: param – the parameter for which to initialize the state.
Returns: A named tuple containing the initial optimization state for the parameter.

class timecast.optim.Momentum(learning_rate=None, beta=0.9, weight_decay=0, nesterov=False)[source]¶

Bases: flax.optim.base.OptimizerDef

Momentum optimizer.

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶

Apply a gradient for a single parameter.

Parameters

step – the current step of the optimizer.
hyper_params – a named tuple of hyper parameters.
param – the parameter that should be updated.
state – a named tuple containing the state for this parameter
grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]¶

Initializes the state for a parameter.

Parameters: param – the parameter for which to initialize the state.
Returns: A named tuple containing the initial optimization state for the parameter.

class timecast.optim.MultiplicativeWeights(eta: float = None)[source]¶

Bases: flax.optim.base.OptimizerDef

Multiplicative weights

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶: Apply per-parametmer gradients

init_param_state(param)[source]¶: Initialize parameter state

class timecast.optim.ProjectedSGD(learning_rate: float = None, projection_threshold: float = None)[source]¶

Bases: flax.optim.base.OptimizerDef

Gradient descent optimizer with projections.

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶: Apply per-parametmer gradients

init_param_state(param)[source]¶: Initialize parameter state

class timecast.optim.RMSProp(learning_rate: float = None, beta2=0.999, eps=1e-08)[source]¶

Bases: flax.optim.base.OptimizerDef

RMSProp optimizer

apply_param_gradient(step, hyper_params, param, state, grad)[source]¶: Apply per-parameter gradients

init_param_state(param)[source]¶: Initialize parameter state