class nbla::SoftmaxCrossEntropy

template<typename T, typename Tl = int> class SoftmaxCrossEntropy : public nbla::BaseFunction<int>

SoftmaxCrossEntropy calculate the element-wise cross entropy between the variables and the variables of a label given by a category index with Softmax normalization.

\[ y_{j} = -\ln \left(\frac{\exp(x_{t_j,j})}{\sum_{i'} exp(x_{i'j})}\right) \]

along dimension specified by axis.

SoftmaxCrossEntropy is equivalent to Softmax+CategoricalCrossEntropy, but computing them at once has the effect of reducing computational error.

Inputs (i is axis normalization taken):

Scores N-D array. ( \(D_1 \times ... \times D_i \times ... \times D_N\))
Labels N-D array. ( \(D_1 \times ... \times 1 \times ... \times D_N\))

Outputs:

Element-wise losses N-D array. ( \(D_1 \times ... \times 1 \times ... \times D_N\))

Template Parameters:

T – Data type for computation and score variable.
Tl – Data type of label variable.

Param axis:

Axis normalization is taken.

Public Functions

inline virtual shared_ptr<Function> copy() const: Copy another instance of Function with the same context.

inline virtual vector<dtypes> in_types()

Get input dtypes.

Last in_type will be used repeatedly if size of in_types is smaller than size of inputs

inline virtual vector<dtypes> out_types()

Get output dtypes.

Last out_type will be used repeatedly if size of out_types is smaller than size of outputs

inline virtual int min_inputs()

Get minimum number of inputs.

This is meant to be used in setup function with in_types which is used to get maximum number of inputs.

inline virtual int min_outputs()

Get minimum number of outputs.

This is meant to be used in setup function with out_types which is used to get max number of outputs.

inline virtual string name(): Get function name in string.

inline virtual vector<string> allowed_array_classes(): Get array classes that are allowed to be specified by Context.

inline virtual bool grad_depends_output_data(int i, int o) const

Dependency flag for checking if in-grad depends on out-data.

Checking if i-th input’ gradient computation requires o-th output’s data or not.

Note

If any of inputs requires an output variable data when computing its gradient, this function must be overridden to return appropriate boolean value. Otherwise, backward computation will be incorrect.

Parameters:

i – [in] Input variable index.
o – [in] Output variable index.