class nbla::BinaryConnectConvolution

template<typename T> class BinaryConnectConvolution : public nbla::BaseFunction<int, const vector<int>&, const vector<int>&, const vector<int>&, int, float>

N-D BinaryConnect Convolution with bias.

Reference: M. Courbariaux, Y. Bengio, and J.-P. David. “BinaryConnect:

Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.

NOTES:

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (input parameter #2) and not the binarized weights (input parameter #3)

2) Only after a call to forward() the weights and the binary weights are in sync, not after a call to backward(). If wanting to store the parameters of the network, remember to call forward() once before doing so, otherwise the weights and the binary weights will not be in sync.

Inputs ( \(B\) is base_axis):

Input \((B + 1 + N)\)-D array ( \(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
Weight \((2 + N)\)-D array ( \(C' \times C \times K_1 \times ... \times K_N\)).
Binary Weight \((2 + N)\)-D array ( \(C' \times C \times K_1 \times ... \times K_N\)).
(optional) Bias vector ( \(C'\)).

Outputs:

\((B + 1 + N)\)-D array ( \( M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N \)).

See also

For Dilated Convolution (a.k.a a trous), refer to:

Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. https://arxiv.org/abs/1606.00915
Yu et al., Multi-Scale Context Aggregation by Dilated Convolutions. https://arxiv.org/abs/1511.07122

Template Parameters:: T – Data type for computation.
Param base_axis:: Base axis of Convolution operation. Dimensions up to base_axis is treated as sample dimension.
Param pad:: Padding sizes for dimensions.
Param stride:: Stride sizes for dimensions.
Param dilation:: Dilation sizes for dimensions.
Param group:: Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
Param quantize_zero_to:: Input value at zero is quantized to this value.

Public Functions

inline virtual shared_ptr<Function> copy() const: Copy another instance of Function with the same context.

inline virtual vector<dtypes> in_types()

Get input dtypes.

Last in_type will be used repeatedly if size of in_types is smaller than size of inputs

inline virtual vector<dtypes> out_types()

Get output dtypes.

Last out_type will be used repeatedly if size of out_types is smaller than size of outputs

inline virtual int min_inputs()

Get minimum number of inputs.

This is meant to be used in setup function with in_types which is used to get maximum number of inputs.

inline virtual int min_outputs()

Get minimum number of outputs.

This is meant to be used in setup function with out_types which is used to get max number of outputs.

inline virtual string name(): Get function name in string.

inline virtual vector<string> allowed_array_classes(): Get array classes that are allowed to be specified by Context.

inline virtual bool grad_depends_output_data(int i, int o) const

Dependency flag for checking if in-grad depends on out-data.

Checking if i-th input’ gradient computation requires o-th output’s data or not.

Note

If any of inputs requires an output variable data when computing its gradient, this function must be overridden to return appropriate boolean value. Otherwise, backward computation will be incorrect.

Parameters:

i – [in] Input variable index.
o – [in] Output variable index.