[源码阅读] TensorRT —— Caffe Parser、Plugin

本文链接：https://blog.csdn.net/Chris_zhangrx/article/details/123296787

文章目录

本文主要从代码层面对 TensorRT 的源码进行学习，试图从中梳理出一点实现思路以及实现细节吧。个人水平有限，主要是从这个过程中学习为主，若有理解不对的地方欢迎交流指正。

注：本文并不涉及到具体功能性的介绍，例如如何一步步去添加 Plugin ，或者一些具体的接口要如何使用等。

前言

TensorRT 源码部分主要开源出来了 Parser 部分以及 Plugin ，并且给出了相关的 demo 和一系列的 sample 供使用者可以快速学习。

推理引擎其实也是一个图编译器的过程，比较核心的当属中端的图优化技术，和 runtime 的实现了。而这里只开源前端和 plugin ，个人看法是更方便让用户理解 TensorRT 解析模型的方式，让有自定义需求的人更好地去实现自己的自定义算子解析（plugin）。

Caffe parser

关于 parser 的核心逻辑就是，TensorRT 自己实现了相关层的实现 kernel，通过 API 的方式将输入模型逐层加入到 INetworkDefinition 类对象中。

INetworkDefinition network;
network.addInput(const char* name, DataType type, Dims dimensions); // 添加输入
network.addPluginV2(...);  // 添加自定义算子
network.markOutput(ITensor& tensor); // 将某一层标记为输出
network.addActivation(ITensor& input, ActivationType type); // 添加激活层
...

这里记录一下一个实现的点就是，INetworkDefinition 这个类继承自一个 INoCopy 的类，实现为：

class INoCopy
{
protected:
    INoCopy() = default;
    virtual ~INoCopy() = default;
    INoCopy(const INoCopy& other) = delete;
    INoCopy& operator=(const INoCopy& other) = delete;
    INoCopy(INoCopy&& other) = delete;
    INoCopy& operator=(INoCopy&& other) = delete;
};

class INetworkDefinition : public INoCopy
{
public:
...
};

可以看到父类是 non-copyable 和 non-movable 的，要修改只能通过指针来操作它们。这就使继承它的子类也不能被拷贝或者移动，如果想拷贝或者移动它们，编译时会遇到以下报错：

copy constructor of ‘INetworkDefinition’ is implicitly deleted because base class ‘INoCopy’ has a deleted copy constructor

上面的语言依据是： https://en.cppreference.com/w/cpp/language/copy_constructor

The implicitly-declared or defaulted copy constructor for class `T` is defined as *deleted* if any of the following conditions are true:

T has non-static data members that cannot be copied (have deleted, inaccessible, or ambiguous copy constructors);
T has direct or virtual base class that cannot be copied (has deleted, inaccessible, or ambiguous copy constructors);
T has direct or virtual base class or a non-static data member with a deleted or inaccessible destructor;

实现中有很多的类都需要 non-copyable 和 non-movable 的，所以使用 INoCopy 这样一个父类也就避免了每一个子类都要设置一遍 copy 和 move 相关的构造函数为 deleted。

然后加载解析原模型是通过 op type 来匹配的将相对应的层和映射函数联系在一起的。这里每一个 LayerParameter 类都会有 string 类型的 type 属性，通过一个 map 将名字与映射函数绑定在一起：

// 别名
typedef nvinfer1::ILayer* (*LayerParseFn)(nvinfer1::INetworkDefinition&, const trtcaffe::LayerParameter&, CaffeWeightFactory&, BlobNameToTensor&);

// 函数声明
nvinfer1::ILayer* parseConvolution(nvinfer1::INetworkDefinition& network, const trtcaffe::LayerParameter& msg, CaffeWeightFactory& weightFactory, BlobNameToTensor& tensors);
nvinfer1::ILayer* parsePooling(nvinfer1::INetworkDefinition& network, const trtcaffe::LayerParameter& msg, CaffeWeightFactory& /*weightFactory*/, BlobNameToTensor& tensors);
nvinfer1::ILayer* parsePReLU(nvinfer1::INetworkDefinition& network, const trtcaffe::LayerParameter& msg, CaffeWeightFactory& weightFactory, BlobNameToTensor& tensors);
...

// 映射 map
static std::unordered_map<std::string, LayerParseFn> gParseTable
{
    {"Convolution", parseConvolution},
    {"Pooling", parsePooling},
    {"ReLU", parseReLU},
		...
};

// 解析映射
std::string op_type = layerMsg.type();
auto v = gParseTable.find(op_type);
ILayer* layer = (*v->second)(network, layerMsg, weights, *static_cast<BlobNameToTensor*>(mBlobNameToTensor));

下面以比较简单的 Relu 算子的解析函数 parseReLU，大概了解一下映射逻辑：

ILayer* parseReLU(INetworkDefinition& network, const trtcaffe::LayerParameter& msg, CaffeWeightFactory& /*weightFactory*/, BlobNameToTensor& tensors)
{
    if (!checkBlobs(msg, 1, 1)) // 检查输入（bottom）个数和输出（top）个数是否一致
    {
        return nullptr;
    }

    const trtcaffe::ReLUParameter& p = msg.relu_param(); // 获得相关的属性参数

    if (p.has_negative_slope() && p.negative_slope() != 0)
    {
        auto newLayer = network.addActivation(*tensors[msg.bottom(0)], ActivationType::kLEAKY_RELU); // 添加新 TensorRT 层
        newLayer->setAlpha(p.negative_slope()); // 对应设置相关属性
        return newLayer;
    }
    return network.addActivation(*tensors[msg.bottom(0)], ActivationType::kRELU); // 添加新 TensorRT 层
}

再来看一组 InnerProduct 算子，区别是，这个算子有 weight 这个概念，通过 CaffeWeightFactory 类进行操作的：

ILayer* parseInnerProduct(INetworkDefinition& network, const trtcaffe::LayerParameter& msg, CaffeWeightFactory& weightFactory, BlobNameToTensor& tensors)
{
    const trtcaffe::InnerProductParameter& p = msg.inner_product_param();

    int64_t nbInputs = parserutils::volume(parserutils::getCHW(tensors[msg.bottom(0)]->getDimensions()));
    int64_t nbOutputs = p.num_output();

    float std_dev = 1.0F / sqrtf(nbInputs * nbOutputs);
    Weights kernelWeights = weightFactory.isInitialized() ? weightFactory(msg.name(), WeightType::kGENERIC) : weightFactory.allocateWeights(nbInputs * nbOutputs, std::normal_distribution<float>(0.0F, std_dev));
    Weights biasWeights = !p.has_bias_term() || p.bias_term() ? (weightFactory.isInitialized() ? weightFactory(msg.name(), WeightType::kBIAS) : weightFactory.allocateWeights(nbOutputs)) : weightFactory.getNullWeights();

    weightFactory.convert(kernelWeights);
    weightFactory.convert(biasWeights);
    return network.addFullyConnected(*tensors[msg.bottom(0)], p.num_output(), kernelWeights, biasWeights);
}

CaffeWeightFactory 类主要是围绕从 Caffe 原生的 blobs 数据结构 —— BlobProto 获取相关的操作和信息，并且映射成 TensorRT 需要的 Weights 类。

class Weights
{
public:
    DataType type;      //!< The type of the weights.
    const void* values; //!< The weight values, in a contiguous array.
    int64_t count;      //!< The number of weights in the array.
};

然后在 blob 数据结构中还有名字这个属性或者通过 layer 结构通过 index 索引可以获取到层的 blob ，例如 Caffe 的 conv 层， blobs[0] 代表是 weight, blobs[1] 代表的是 bias 。这里将 blob 转化为仅携带数据信息的 Weights 类，额外创建了枚举类 WeightsType 来指示当前的 Weights 对象对应原框架 blob 是 weight 还是 bias。通过枚举自动转成相应的 int 类型的 index，在获取时避免了魔数，意义更明确。

// 枚举类定义
enum class WeightType
{
    // types for convolution, deconv, fully connected
    kGENERIC = 0, // typical weights for the layer: e.g. filter (for conv) or matrix weights (for innerproduct)
    kBIAS = 1,    // bias weights

    // These enums are for BVLCCaffe, which are incompatible with nvCaffe enums below.
    // See batch_norm_layer.cpp in BLVC source of Caffe
    kMEAN = 0,
    kVARIANCE = 1,
    kMOVING_AVERAGE = 2,

    // These enums are for nvCaffe, which are incompatible with BVLCCaffe enums above
    // See batch_norm_layer.cpp in NVidia fork of Caffe
    kNVMEAN = 0,
    kNVVARIANCE = 1,
    kNVSCALE = 3,
    kNVBIAS = 4
};

// 使用方式
 weightFactory(msg.name(), WeightType::kGENERIC); // 初始化一个权重
 weightFactory(msg.name(), WeightType::kBIAS)； // 初始化一个 bias

// 获取部分逻辑
...
  for (int i = 0, n = mMsg.layer_size(); i < n; i++)
  {
      if (mMsg.layer(i).name() == layerName && index < mMsg.layer(i).blobs_size())
      {
          return &mMsg.layer(i).blobs(index); // 通过 layer_name 和 传入的枚举索引 blob
      }
  }
...

所以总结一下就是：

将 caffe 数据结构 blob 转化成 Weights 类，并使用枚举 WeightType 来起到索引的作用，明确表示 Weights 类对象的含义。
使用 ITensor 的数据结构在 TensorRT 里面穿梭。
通过 op type 对应相关的算子解析函数，在这些映射函数通过 API 的方式构建起整个网络。

Plugin

大概了解了上面 Caffe parser 的解析过程， Plugin 算子的功能整体思路就不难理解了，TensorRT 给出了抽象基类，定义好了一系列纯虚函数，用户只要根据接口定义，对应想实现的算子功能，将需要到的纯虚函数实现就可以了。有点类似于设计模式的模板方法，底层已经定义好了支持自定义层这些纯虚接口该如何使用，即定好了使用模板，用户根据需求自己继承对应的子类实现即可，延迟绑定。

根据不同的场景，TensorRT 提供了 3 种自定义算子可以继承的基类，以其中一个官方实现说明:

class BatchedNMSPlugin : public IPluginV2Ext
{
public:
    BatchedNMSPlugin(NMSParameters param);
    BatchedNMSPlugin(const void* data, size_t length);
    ~BatchedNMSPlugin() override = default;

    // IPluginV2 methods
    const char* getPluginType() const noexcept override;
    const char* getPluginVersion() const noexcept override;
    int getNbOutputs() const noexcept override;
    Dims getOutputDimensions(int index, const Dims* inputs, int nbInputDims) noexcept override;
    bool supportsFormat(DataType type, PluginFormat format) const noexcept override;
    size_t getWorkspaceSize(int maxBatchSize) const noexcept override;
    int32_t enqueue(int32_t batchSize, void const* const* inputs, void* const* outputs, void* workspace,
        cudaStream_t stream) noexcept override;
    int initialize() noexcept override;
    void terminate() noexcept override;
    size_t getSerializationSize() const noexcept override;
    void serialize(void* buffer) const noexcept override;
    void destroy() noexcept override;
    void setPluginNamespace(const char* libNamespace) noexcept override;
    const char* getPluginNamespace() const noexcept override;
    void setClipParam(bool clip) noexcept;
    void setScoreBits(int32_t scoreBits) noexcept;
	
private: // 这里根据自己的需求，自己添加私有成员变量即可
    NMSParameters param{};
    int boxesSize{};
    int scoresSize{};
    int numPriors{};
    std::string mNamespace;
    bool mClipBoxes{};
    DataType mPrecision;
    int32_t mScoreBits;
    pluginStatus_t mPluginStatus{};
};

IPluginV2Ext 和 IPluginV2DynamicExt 则是在此基础上又多增加了几个需要的纯虚接口。

class IPluginV2DynamicExt :: public IPluginV2Ext :: public IPluginV2 的继承关系。

上面实现完了再将新的算子类注册一下， TensorRT 提供了 IPluginCreator 类来实现对应的注册功能。

network 层面的有统一的添加 API:

// 函数声明
IPluginV2Layer* addPluginV2(ITensor* const* inputs, int32_t nbInputs, IPluginV2& plugin)

对应 Plugin 算子功能实现对应接口的实现，然后重新编译相关的 Lib ，TensorRT 就可以加载识别了。

代码部分收获

善用更有意义的枚举来取代一些魔数，发挥一些功能
将父类的析构函数设置成 protected ，防止在类实现外可以直接 delete 掉相关系列的指针，只能通过指定的接口来释放资源。并且也不能将父类指针用成智能指针。

class Parent {  
  public:  
    Parent () //Constructor
    {
        cout << "\n Parent constructor called\n" << endl;
    }

    void destroy() { // 指定接口释放资源
       delete this;
    }
protected:
    ~Parent() //Dtor
    {
        cout << "\n Parent destructor called\n" << endl;
    }
};

class Child : public Parent 
{
  public:
    Child () //Ctor
    {
        cout << "\nChild constructor called\n" << endl;
    }

    ~Child() //dtor
    {
        cout << "\nChild destructor called\n" << endl;
    }
};

int main() {
  Parent p1 = new Parent();
	p1->destroy(); // OK
  delete p1;  // ERROR

	Parent p2 = new Child();
  p2->destroy(); // OK
  delete p2;  // ERROR

  std::shared_ptr<Parent> p3(new Parent); // ERROR
  return 0;
}

3.通过编译器隐式声明的特点，类似上面 INoCopy 的情况，将需要限制的构造函数/析构函数单独定义一个基类，然后让所有子类继承，就可以不用再每个子类新声明一遍，减少代码，还包括下面的例子。

class VRoot
{
public:
    virtual ~VRoot() noexcept = default;
};

class VHostMemory : public VRoot {...};
class VDimensionExpr : public VRoot{...};