TensorRT It's an acceleration package made by NVIDIA for its own platform ,TensorRT Two things have been done , To speed up the model .
1、TensorRT Support INT8 and FP16 The calculation of . Deep learning network in training , Usually use 32 Bit or 16 Bit data .TensorRT In the reasoning of the network, the accuracy is not so high , Achieve the purpose of accelerating inference .
2、TensorRT The network structure is reconstructed , Combine some operations that can be combined , in the light of GPU The characteristics of are optimized . Most deep learning frameworks are not targeted at GPU Performance optimization , And NVIDIA ,GPU Producers and porters , Naturally, it is launched for itself GPU Acceleration tool TensorRT. A deep learning model , Without optimization , For example, a convoluted layer 、 A bias layer and a reload layer , These three layers need to be called three times cuDNN Corresponding API, But in fact, the implementation of these three layers can be combined ,TensorRT Will merge some networks that can be merged . We pass a typical inception block Let's take a look at this merging operation . TensorRT Used for reasoning and optimization of models , There is a Python Interface , Actual use test down ,python Model reasoning speed of interface C++ Almost the same . Here are more detailed records TensorRT python Interface from environment configuration to model transformation , And then to the reasoning process , And models INT8 quantitative , If I have time, I will also summarize and record it , The version I use is TensorRT7.0 edition , This version supports forward reasoning of model dynamic dimensions , The following will also be divided into static reasoning and dynamic reasoning .
tensorRT The configuration of is very simple , Register on the official website , Fill out the questionnaire , You can download , I use TensorRT-7.0.0.1