timecast.optim package

Submodules

timecast.optim.core module

Module contents

timecast.optim

class timecast.optim.Adagrad(learning_rate: float = None, eps=1e-08)[source]

Bases: flax.optim.base.OptimizerDef

Adagrad optimizer

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply per-parameter gradients

init_param_state(param)[source]

Initialize parameter state

class timecast.optim.Adam(learning_rate=None, beta1=0.9, beta2=0.999, eps=1e-08, weight_decay=0.0)[source]

Bases: flax.optim.base.OptimizerDef

Adam optimizer.

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply a gradient for a single parameter.

Parameters
  • step – the current step of the optimizer.

  • hyper_params – a named tuple of hyper parameters.

  • param – the parameter that should be updated.

  • state – a named tuple containing the state for this parameter

  • grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]

Initializes the state for a parameter.

Parameters

param – the parameter for which to initialize the state.

Returns

A named tuple containing the initial optimization state for the parameter.

class timecast.optim.GradientDescent(learning_rate=None)[source]

Bases: flax.optim.base.OptimizerDef

Gradient descent optimizer.

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply a gradient for a single parameter.

Parameters
  • step – the current step of the optimizer.

  • hyper_params – a named tuple of hyper parameters.

  • param – the parameter that should be updated.

  • state – a named tuple containing the state for this parameter

  • grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]

Initializes the state for a parameter.

Parameters

param – the parameter for which to initialize the state.

Returns

A named tuple containing the initial optimization state for the parameter.

class timecast.optim.LAMB(learning_rate=None, beta1=0.9, beta2=0.999, weight_decay=0, eps=1e-06)[source]

Bases: flax.optim.base.OptimizerDef

Layerwise adaptive moments for batch (LAMB) optimizer.

See https://arxiv.org/abs/1904.00962

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply a gradient for a single parameter.

Parameters
  • step – the current step of the optimizer.

  • hyper_params – a named tuple of hyper parameters.

  • param – the parameter that should be updated.

  • state – a named tuple containing the state for this parameter

  • grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]

Initializes the state for a parameter.

Parameters

param – the parameter for which to initialize the state.

Returns

A named tuple containing the initial optimization state for the parameter.

class timecast.optim.LARS(learning_rate=None, beta=0.9, weight_decay=0, trust_coefficient=0.001, eps=0, nesterov=False)[source]

Bases: flax.optim.base.OptimizerDef

Layerwise adaptive rate scaling (LARS) optimizer.

See https://arxiv.org/abs/1708.03888

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply a gradient for a single parameter.

Parameters
  • step – the current step of the optimizer.

  • hyper_params – a named tuple of hyper parameters.

  • param – the parameter that should be updated.

  • state – a named tuple containing the state for this parameter

  • grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]

Initializes the state for a parameter.

Parameters

param – the parameter for which to initialize the state.

Returns

A named tuple containing the initial optimization state for the parameter.

class timecast.optim.Momentum(learning_rate=None, beta=0.9, weight_decay=0, nesterov=False)[source]

Bases: flax.optim.base.OptimizerDef

Momentum optimizer.

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply a gradient for a single parameter.

Parameters
  • step – the current step of the optimizer.

  • hyper_params – a named tuple of hyper parameters.

  • param – the parameter that should be updated.

  • state – a named tuple containing the state for this parameter

  • grad – the gradient tensor for the parameter.

Returns

A tuple containing the new parameter and the new state.

init_param_state(param)[source]

Initializes the state for a parameter.

Parameters

param – the parameter for which to initialize the state.

Returns

A named tuple containing the initial optimization state for the parameter.

class timecast.optim.MultiplicativeWeights(eta: float = None)[source]

Bases: flax.optim.base.OptimizerDef

Multiplicative weights

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply per-parametmer gradients

init_param_state(param)[source]

Initialize parameter state

class timecast.optim.ProjectedSGD(learning_rate: float = None, projection_threshold: float = None)[source]

Bases: flax.optim.base.OptimizerDef

Gradient descent optimizer with projections.

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply per-parametmer gradients

init_param_state(param)[source]

Initialize parameter state

class timecast.optim.RMSProp(learning_rate: float = None, beta2=0.999, eps=1e-08)[source]

Bases: flax.optim.base.OptimizerDef

RMSProp optimizer

apply_param_gradient(step, hyper_params, param, state, grad)[source]

Apply per-parameter gradients

init_param_state(param)[source]

Initialize parameter state