Extending Loupe#
Adding a new operation to Loupe requires implementing a new
Function
class. Any new Function should do three
things:
Populate the computational graph with details of the operation
Implement a
forward()
method to compute the results of the operationImplement a
backward()
method to compute gradients as needed during a backward pass
Populating the computational graph#
New functions should inherit from loupe.core.Function
. The class
constructor can take as many arguments as needed to perform the operation, and
may provide default values. Because access to the inputs is needed by other
class methods, they should be stored as instance attributes by the class
constructor. Inputs are registered to the computational graph by passing them
to the Function
class constructor.
Any Python type may be accepted, but only loupe.array
or
Function
objects may be tracked on the computational
graph. The asarray()
method can be used to ensure inputs
implement the array interface.
Note that only inputs that may be optimization parameters need to be registered on the graph. There is no harm in registering all inputs, but doing so may impact overall performance.
As a motivating example, we’ll recreate the power
function
here step-by-step while providing additional comments to describe the
underlying logic:
class power(loupe.core.Function):
def __init__(self, x1, x2):
# Store input as instance attribute. Because the inpput may vary
# during optimization, it must be array-like
self.input = loupe.asarray(x1)
# Because the exponent will not change once it is defined, it does
# not need to be array-like and can be stored in its native format
self.exp = x2
# Register input on the graph. Since the exponent will not
# vary during optimization, there is no need to register it
# on the graph.
super().__init__(self.input)
forward()
#
forward()
is called any time the function needs to
be evaluated. The result of the operation is computed using the inputs passed
to the Function constructor and the computed result should be returned as a
Numpy array.
Because some of the function inputs may be functions themselves, it is
necessary to explicitly grab the underlying data from any Loupe arrays or
functions using either the data
attribute or the
getdata()
method. Because this operation may be
computationally expensive, it should only be performed once during function
evaluation.
Some of the input data used in forward() may also be needed in backward().
Since fetching the input data can be expensive, it is desirable to cache the
results so they do not need to be recomputed when backward() is called. The
cache_for_backward()
method provides a simple
list-based caching mechanism for this purpose.
Returning to our implementation of power(), we’ll write the code to compute \(y = x^n\):
class power(loupe.core.Function):
def __init__(self, x1, x2):
self.input = loupe.asarray(x1)
self.exp = x2
super().__init__(self.input)
def forward(self):
# Get the input's underlying data
input = self.input.getdata()
# Cache input's data for use in backward()
self.cache_for_backward(input)
# Compute and return the result. Note we can use self.exp
# directly since it constant once defined in __init__()
return np.power(input, self.exp)
Note that both data
and getdata()
(when called with the default
arguments) return the same underlying data, but getdata()
provides
additional flexibility if needed and is therefore the preferred approach.
backward()
#
backward()
is called any time the gradient of the
function needs to be computed. The grad
argument is always provided and
represents the gradient with respect to the given output of the function.
backward() should compute the gradient of each function input with
requires_grad=True
and pass the result to the input’s backward() method.
Note that any inputs that were cached in forward() are available in the
function’s cache
attribute. Because cache is a
Python list, list unpacking is the recommended approach for accessing the
cached values.
We’ll complete our development of the power
function by
implementing its backward() method to compute
\(\bar{x} = n x^{n-1} \bar{y}\):
class power(loupe.core.Function):
def __init__(self, x1, x2):
self.input = loupe.asarray(x1)
self.exp = x2
super().__init__(self.input)
def forward(self):
input = self.input.getdata()
self.cache_for_backward(input)
return np.power(input, self.exp)
def backward(self, grad):
# Unpack the value of input that was cached during the
# call to forward()
input, = self.cache
# Compute the gradient of this function and pass the
# result to input.backward()
grad = self.exp * np.power(input, self.exp-1) * grad
self.input.backward(grad)