@rl-js/interfaces NPM

Interfaces

ActionTraces

Kind: global interface

ActionTraces
- .record(state, action) ⇒ ActionTraces
- .update(error) ⇒ ActionTraces
- .decay(amount) ⇒ ActionTraces
- .reset() ⇒ ActionTraces

actionTraces.record(state, action) ⇒ ActionTraces

Records a trace for the given state-action pair.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

Param	Type	Description
state	*	State object of type specific to the environment
action	*	Action object of type specific to the environment

actionTraces.update(error) ⇒ ActionTraces

Updates the value function based on the stored traces, and the given error.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

Param	Type	Description
error	number	The current TD error

actionTraces.decay(amount) ⇒ ActionTraces

Decay the traces by the given amount.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

Param	Type	Description
amount	number	The amount to multiply the traces by, usually a value less than 1.

actionTraces.reset() ⇒ ActionTraces

Reset the traces to their starting values. Usually called at the beginning of an episode.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

ActionValueFunction ⇐ FunctionApproximator

Kind: global interface
Extends: FunctionApproximator

ActionValueFunction ⇐ FunctionApproximator
- .call(state, action) ⇒ number
- .update(state, action, error)
- .gradient(state, action) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)

actionValueFunction.call(state, action) ⇒ number

Estimate the expected value of the returns given a specific state-action pair

Kind: instance method of ActionValueFunction
Overrides: call
Returns: number - - The approximated action value (q)

Param	Type	Description
state	*	State object of type specific to the environment
action	*	Action object of type specific to the environment

actionValueFunction.update(state, action, error)

Update the value of the function approximator for a given state-action pair

Kind: instance method of ActionValueFunction
Overrides: update

Param	Type	Description
state	*	State object of type specific to the environment
action	*	Action object of type specific to the environment
error	number	The difference between the target value and the currently approximated value

actionValueFunction.gradient(state, action) ⇒ Array.<number>

Compute the gradient of the function approximator for a given state-action pair, with respect to its parameters.

Kind: instance method of ActionValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

Param	Type	Description
state	*	State object of type specific to the environment
action	*	Action object of type specific to the environment

actionValueFunction.getParameters() ⇒ Array.<number>

Get the differentiable parameters of the function approximator

Kind: instance method of ActionValueFunction
Returns: Array.<number> - The parameters that define the function approximator

actionValueFunction.setParameters(parameters)

Set the differentiable parameters fo the function approximator

Kind: instance method of ActionValueFunction

Param	Type	Description
parameters	Array.<number>	new parameters for the function approximator

actionValueFunction.updateParameters(errors)

Update the parameters in some direction given by an array of errors.

Kind: instance method of ActionValueFunction

Param	Type	Description
errors	Array.<number>	= The direction with which to update each parameter

AgentFactory

Kind: global interface

agentFactory.createAgent() ⇒ Agent

Kind: instance method of AgentFactory

Agent

Kind: global interface

Agent
- .newEpisode(environment)
- .act()

agent.newEpisode(environment)

Prepare the agent of the next episode. The Agent should perform any cleanup and setup stepts that are necessary here. An Environment object is passed in, which the agent should store each time.

Kind: instance method of Agent

Param	Type	Description
environment	Environment	The Environment object for the new episode.

agent.act()

Perform an action for the current timestep. Usually, the agent should at least: 1) dispatch an action to the environment, and 2) perform any necessary internal updates (e.g. updating the value function).

Kind: instance method of Agent

EnvironmentFactory

Kind: global interface

environmentFactory.createEnvironment() ⇒ Environment

Kind: instance method of EnvironmentFactory

Environment

Kind: global interface

Environment
- .dispatch(action)
- .getObservation() ⇒ *
- .getReward() ⇒ number
- .isTerminated() ⇒ boolean

environment.dispatch(action)

Apply an action selected by an Agent to the environment. This could a string representing the action (e.g. "LEFT"), or an array representing the force to apply on actuators, etc.

Kind: instance method of Environment

Param	Type	Description
action	*	An action object specific to the environment.

environment.getObservation() ⇒ *

Get an environment-specific observation for the current timestep. This might be a string identifying the current state, an array representing the current environment parameters, pixel-data representing the agent's vision, etc.

Kind: instance method of Environment
Returns: * - An observation object specific to the environment.

environment.getReward() ⇒ number

Get the reward for the current timestep. Rewards guide the learning of the agent: Positive rewards should be given when the agent selects good actions, and negative rewards should be given when the agent selects bad actions.

Kind: instance method of Environment
Returns: number - A scalar representing the reward for the current timestep.

environment.isTerminated() ⇒ boolean

Return whether or not the current episode is terminated, or finished. For example, this should return True if the agent has reached some goal, if the maximum number of timesteps has been exceeded, or if the agent has otherwise failed. Otherwise, this should return False.

Kind: instance method of Environment
Returns: boolean - A boolean representing whether or not the episode has terminated.

FunctionApproximator

Kind: global interface

FunctionApproximator
- .call(args) ⇒ number
- .update(args, error)
- .gradient(args) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)

functionApproximator.call(args) ⇒ number

Call the function approximators with the given arguments. The FA should return an estimate of the value of the function at the point given by the arguments.

Kind: instance method of FunctionApproximator
Returns: number - - The approximated value of the function at the given point

Param	Type	Description
args	*	Arguments to the function being approximated approximated

functionApproximator.update(args, error)

Update the value of the function approximator at the given point.

Kind: instance method of FunctionApproximator

Param	Type	Description
args	*	Arguments to the function being approximated approximated
error	number	The difference between the target value and the currently approximated value

functionApproximator.gradient(args) ⇒ Array.<number>

Compute the gradient of the function approximator at the given point, with respect to its parameters.

Kind: instance method of FunctionApproximator
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

Param	Type	Description
args	Array.<number>	Arguments to the function being approximated approximated

functionApproximator.getParameters() ⇒ Array.<number>

Get the differentiable parameters of the function approximator

Kind: instance method of FunctionApproximator
Returns: Array.<number> - The parameters that define the function approximator

functionApproximator.setParameters(parameters)

Set the differentiable parameters fo the function approximator

Kind: instance method of FunctionApproximator

Param	Type	Description
parameters	Array.<number>	new parameters for the function approximator

functionApproximator.updateParameters(errors)

Update the parameters in some direction given by an array of errors.

Kind: instance method of FunctionApproximator

Param	Type	Description
errors	Array.<number>	= The direction with which to update each parameter

PolicyTraces

Kind: global interface

PolicyTraces
- .record(state, action) ⇒ PolicyTraces
- .update(error) ⇒ PolicyTraces
- .decay(amount) ⇒ PolicyTraces
- .reset() ⇒ PolicyTraces

policyTraces.record(state, action) ⇒ PolicyTraces

Records a trace for the given state-action pair.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

Param	Type	Description
state	*	State object of type specific to the environment
action	*	Action object of type specific to the environment

policyTraces.update(error) ⇒ PolicyTraces

Updates the value function based on the stored traces, and the given error.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

Param	Type	Description
error	number	The current TD error

policyTraces.decay(amount) ⇒ PolicyTraces

Decay the traces by the given amount.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

Param	Type	Description
amount	number	The amount to multiply the traces by, usually a value less than 1.

policyTraces.reset() ⇒ PolicyTraces

Reset the traces to their starting values. Usually called at the beginning of an episode.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

Policy

Kind: global interface

Policy
- .chooseAction(state) ⇒ *
- .chooseBestAction(state) ⇒ *
- .probability(state, action) ⇒ number
- .update(state, action, error)
- .gradient(state, action) ⇒ Array.<number>
- .trueGradient(state, action) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)

policy.chooseAction(state) ⇒ *

Choose an action given the current state.

Kind: instance method of Policy
Returns: * - An Action object of type specific to the environment

Param	Type	Description
state	*	State object of type specific to the environment

policy.chooseBestAction(state) ⇒ *

Choose the best known action given the current state.

Kind: instance method of Policy
Returns: * - An Action object of type specific to the environment

Param	Type	Description
state	*	State object of type specific to the environment

policy.probability(state, action) ⇒ number

Compute the probability of selecting a given action in a given state.

Kind: instance method of Policy
Returns: number - the probability between 0, 1

Param	Type	Description
state	*	State object of type specific to the environment
action	*	Action object of type specific to the environment

policy.update(state, action, error)

Update the probability of choosing a particular action in a particular state. Generally, a positive error should make chosing the action more likely, and a negative error should make chosing the action less likely.

Kind: instance method of Policy

Param	Type	Description
state	Array.<number>	State object of type specific to the environment
action	*	Action object of type specific to the environment
error	number	The direction and magnitude of the update

policy.gradient(state, action) ⇒ Array.<number>

Compute the gradient of natural logarithm of the probability of choosing the given action in the given state with respect to the parameters of the policy. This can often be computed more efficiently than the true gradient.

Kind: instance method of Policy
Returns: Array.<number> - The gradient of the policy

Param	Type	Description
state	*	State object of type specific to the environment
action	*	Action object of type specific to the environment

policy.trueGradient(state, action) ⇒ Array.<number>

Compute the true gradient of the probability of choosing the given action in the given state with respect to the parameters of the policy. This is contrast to the log gradient which is used for most things.

Kind: instance method of Policy
Returns: Array.<number> - The gradient of log(π(state, action))

Param	Type	Description
state	*	State object of type specific to the environment
action	*	Action object of type specific to the environment

policy.getParameters() ⇒ Array.<number>

Get the differentiable parameters of the policy

Kind: instance method of Policy
Returns: Array.<number> - The parameters that define the policy

policy.setParameters(parameters)

Set the differentiable parameters of the policy

Kind: instance method of Policy

Param	Type	Description
parameters	Array.<number>	The parameters that define the policy

policy.updateParameters(errors)

Update the parameters in some direction given by an array of errors.

Kind: instance method of Policy

Param	Type	Description
errors	Array.<number>	= The direction with which to update each parameter

StateTraces

Kind: global interface

StateTraces
- .record(state) ⇒ StateTraces
- .update(error) ⇒ StateTraces
- .decay(amount) ⇒ StateTraces
- .reset() ⇒ StateTraces

stateTraces.record(state) ⇒ StateTraces

Records a trace for the given state

Kind: instance method of StateTraces
Returns: StateTraces - - This object

Param	Type	Description
state	*	State object of type specific to the environment

stateTraces.update(error) ⇒ StateTraces

Updates the value function based on the stored traces, and the given error.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

Param	Type	Description
error	number	The current TD error

stateTraces.decay(amount) ⇒ StateTraces

Decay the traces by the given amount.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

Param	Type	Description
amount	number	The amount to multiply the traces by, usually a value less than 1.

stateTraces.reset() ⇒ StateTraces

Reset the traces to their starting values. Usually called at the beginning of an episode.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

StateValueFunction ⇐ FunctionApproximator

Kind: global interface
Extends: FunctionApproximator

StateValueFunction ⇐ FunctionApproximator
- .call(state) ⇒ number
- .update(state, error)
- .gradient(state) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)

stateValueFunction.call(state) ⇒ number

Estimate the expected value of the returns given a specific state.

Kind: instance method of StateValueFunction
Overrides: call
Returns: number - - The approximated state value (v)

Param	Type	Description
state	*	State object of type specific to the environment

stateValueFunction.update(state, error)

Update the value of the function approximator for a given state

Kind: instance method of StateValueFunction
Overrides: update

Param	Type	Description
state	*	State object of type specific to the environment
error	number	The difference between the target value and the currently approximated value

stateValueFunction.gradient(state) ⇒ Array.<number>

Compute the gradient of the function approximator for a given state, with respect to its parameters.

Kind: instance method of StateValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point