Add quantizer support

The ArmnnQuantizer program loads a 32-bit floating-point network into Arm NN and converts the network into a quantized asymmetric 8-bit network or a quantized symmetric 16-bit network. A quantized network can reduce the computational and memory cost of executing neural networks.

The Arm NN Quantizer quantizes Arm NN networks to the following data types:

  • QAsymmS8
  • QsymmS16
  • QAsymmU8

To add a network layer to the quantizer:

  1. Declare and implement the Visit<LayerName>function() in the src/armnn/QuantizerVisitor.hpp and src/armnn/QuantizerVisitor.cpp descriptors. The quantizer uses the Visit<LayerName>function() to visit a layer while traversing the network graph during the quantization process. The following code shows the QuantizerVisitor.hpp declaration for an example SoftmaxLayer:
    // Softmax
    
        void VisitSoftmaxLayer(const IConnectableLayer* layer,
                               const SoftmaxDescriptor& softmaxDescriptor,
                               const char* name = nullptr) override;

    The following code shows the QuantizerVisitor.cpp for an example SoftmaxLayer:

    //Softmax
    void QuantizerVisitor::VisitSoftmaxLayer(const IConnectableLayer* layer, const char* name)
    {
        IConnectableLayer* newLayer = m_QuantizedNetwork->AddSoftmaxLayer(name);
        RecordLayer(layer, newLayer);
        SetQuantizedInputConnections(layer, newLayer);
    }
  2. Add unit tests in /src/armnn/test/QuantizerTest.cpp. Each unit test creates a simple network with the new layer. The unit test quantizes that network and uses the visitor pattern to pass the network a test class to run.
    1. Create an implementation of the test class TestSoftmaxQuantization. If the default values are not correct, you must specify the TestQuantizationParams. The following example code creates an implementation of the test class TestSoftmaxQuantization:
    2. // Softmax
      
      BOOST_AUTO_TEST_CASE(QuantizeSoftmax)
      {
          class TestSoftmaxQuantization : public TestQuantization
          {
          public:
              TestSoftmaxQuantization(const TensorShape& inputShape, const TensorShape& outputShape)
              : TestQuantization(inputShape, outputShape) {}
      
              TestSoftmaxQuantization(const QuantizerOptions& options,
                                      const TensorShape& inputShape,
                                      const TensorShape& outputShape)
              : TestQuantization(options, inputShape, outputShape) {}
      
              void VisitSoftmaxLayer(const IConnectableLayer* layer,
                                     const SoftmaxDescriptor& descriptor,
                                     const char* name = nullptr) override
              {
                  IgnoreUnused(descriptor, name);
                  TensorInfo info = layer->GetOutputSlot(0).GetTensorInfo();
      
                  // Based off default static range [0.0f, 1.0f]
                  TestQuantizationParams(info, {1.0f / g_AsymmU8QuantizationBase, 0},
                                               {1.0f / g_AsymmS8QuantizationBase, -128},
                                               {1.0f / g_SymmS8QuantizationBase,  0},
                                               {1.0f / g_SymmS16QuantizationBase, 0});
              }
          };
      
      ...
    3. The test network is input→newLayer→output. The following code shows how you create this network for the SoftmaxLayer using the CreateNetworkWithSoftmaxLayer() helper function:
      // Softmax
      
          SoftmaxDescriptor descriptor;
          descriptor.m_Beta = 1.0f;
      
          const TensorShape shape{1U};
          INetworkPtr network = CreateNetworkWithSoftmaxLayer(descriptor, shape);
      
          INetworkPtr quantizedNetworkQAsymmU8 = INetworkQuantizer::Create(network.get())->ExportNetwork();
          TestSoftmaxQuantization validatorQAsymmU8(shape, shape);
          VisitLayersTopologically(quantizedNetworkQAsymmU8.get(), validatorQAsymmU8);
    4. Quantize and test the network for QSymmS8, QAsymmS8, QAsymmU8, and QAsymmS16. The following code shows how to quantize and test the network:
    5. // Softmax
      
          const QuantizerOptions qAsymmS8Options(DataType::QAsymmS8);
          INetworkPtr quantizedNetworkQAsymmS8 = INetworkQuantizer::Create(network.get(), qAsymmS8Options)->ExportNetwork();
          TestSoftmaxQuantization validatorQAsymmS8(qAsymmS8Options, shape, shape);
          VisitLayersTopologically(quantizedNetworkQAsymmS8.get(), validatorQAsymmS8);
      
          // test QSymmS8 quantization
          const QuantizerOptions qSymmS8Options(DataType::QSymmS8);
          INetworkPtr quantizedNetworkQSymmS8 = INetworkQuantizer::Create(network.get(), qSymmS8Options)->ExportNetwork();
          TestSoftmaxQuantization validatorQSymmS8(qSymmS8Options, shape, shape);
          VisitLayersTopologically(quantizedNetworkQSymmS8.get(), validatorQSymmS8);
      
          const QuantizerOptions qSymmS16options(DataType::QSymmS16);
          INetworkPtr quantizedNetworkQSymmS16 = INetworkQuantizer::Create(network.get(), qSymmS16options)->ExportNetwork();
          TestSoftmaxQuantization validatorQSymmS16(qSymmS16options, shape, shape);
          VisitLayersTopologically(quantizedNetworkQSymmS16.get(), validatorQSymmS16);
Previous Next