NVIDIA TensorRT Inference Server
1.11.0 -0000000
Version select:
  • Documentation home

User Guide

  • Quickstart
    • Prerequisites
    • Using A Prebuilt Docker Container
    • Building With Docker
    • Building With CMake
    • Run TensorRT Inference Server
    • Verify Inference Server Is Running Correctly
    • Getting The Client Examples
    • Running The Image Classification Example
  • Installing the Server
    • Installing Prebuilt Containers
  • Running the Server
    • Example Model Repository
    • Running The Inference Server
    • Running The Inference Server On A System Without A GPU
    • Running The Inference Server Without Docker
    • Checking Inference Server Status
  • Client Libraries
    • Getting the Client Libraries
      • Build Using Dockerfile
      • Build Using CMake
        • Ubuntu 16.04 / Ubuntu 18.04
        • Windows 10
      • Download From GitHub
      • Download Docker Image From NGC
    • Building Your Own Client
    • Client API
      • System Shared Memory
      • CUDA Shared Memory
      • String Datatype
      • Client API for Stateful Models
      • Shape Tensor
  • Client Examples
    • Getting the Client Examples
      • Build Using Dockerfile
      • Build Using CMake
        • Ubuntu 16.04 / Ubuntu 18.04
        • Windows 10
      • Download From GitHub
      • Download Docker Image From NGC
    • Image Classification Example Application
    • Ensemble Image Classification Example Application
    • Performance Measurement Application
  • Models And Schedulers
    • Stateless Models
    • Stateful Models
      • Control Inputs
      • Scheduling Strategies
        • Direct
        • Oldest
    • Ensemble Models
  • Model Repository
    • Modifying the Model Repository
    • Model Versions
    • Framework Model Definition
      • TensorRT Models
      • TensorFlow Models
      • TensorRT/TensorFlow Models
      • ONNX Models
      • PyTorch Models
      • Caffe2 Models
    • Custom Backends
      • Custom Backend API
      • Example Custom Backend
    • Ensemble Backends
  • Model Configuration
    • Generated Model Configuration
    • Datatypes
    • Reshape
    • Version Policy
    • Instance Groups
    • Scheduling And Batching
      • Default Scheduler
      • Dynamic Batcher
      • Sequence Batcher
      • Ensemble Scheduler
    • Optimization Policy
      • TensorRT Optimization
    • Model Warmup
  • Model Management
    • Model Control Mode NONE
    • Model Control Mode POLL
    • Model Control Mode EXPLICIT
  • Optimization
    • Optimization Settings
      • Dynamic Batcher
      • Model Instances
      • Framework-Specific Optimization
        • ONNX with TensorRT Optimization
        • TensorFlow with TensorRT Optimization
    • perf_client
      • Request Concurrency
      • Understanding The Output
      • Visualizing Latency vs. Throughput
      • Input Data
      • Real Input Data
      • Shared Memory
      • Communication Protocol
    • Server Trace
      • JSON Trace Output
      • Trace Summary Tool
  • Metrics

Developer Guide

  • Architecture
    • Concurrent Model Execution
  • Custom Operations
    • TensorRT
    • TensorFlow
    • PyTorch
  • HTTP and GRPC API
    • Health
    • Status
    • Model Control
    • Inference
    • Stream Inference
  • Library API
  • Building
    • Building the Server
      • Building the Server with Docker
        • Incremental Builds with Docker
      • Building the Server with CMake
        • Dependencies
        • Configure Inference Server
        • Build Inference Server
    • Building A Custom Backend
      • Build Using CMake
      • Build Using Custom Backend SDK
      • Using the Custom Instance Wrapper Class
    • Building the Client Libraries and Examples
      • Build Using Dockerfile
      • Build Using CMake
        • Ubuntu 16.04 / Ubuntu 18.04
        • Windows 10
    • Building the Documentation
  • Testing
    • Generate QA Model Repositories
    • Build QA Container
    • Run QA Container
  • Contributing
    • Coding Convention

Reference

  • FAQ
    • What are the advantages of running a model with TensorRT Inference Server compared to running directly using the model’s framework API?
    • Can TensorRT Inference Server run on systems that don’t have GPUs?
    • Can TensorRT Inference Server be used in non-Docker environments?
    • Do you provide client libraries for languages other than C++ and Python?
    • How would you use TensorRT Inference Server within the AWS environment?
    • How do I measure the performance of my model running in the TensorRT Inference Server?
    • How can I fully utilize the GPU with TensorRT Inference Server?
    • If I have a server with multiple GPUs should I use one TensorRT Inference Server to manage all GPUs or should I use multiple inference servers, one for each GPU?
  • Capabilities
  • Protobuf API
    • HTTP/GRPC API
    • Model Configuration
    • Status
  • C++ API
    • Class Hierarchy
    • File Hierarchy
    • Full API
      • Namespaces
        • Namespace nvidia
        • Namespace nvidia::inferenceserver
        • Namespace nvidia::inferenceserver::client
        • Namespace nvidia::inferenceserver::custom
      • Classes and Structs
        • Struct cudaIpcMemHandle_t
        • Struct custom_initdata_struct
        • Struct custom_payload_struct
        • Struct Result::ClassResult
        • Struct InferContext::Stat
        • Class Error
        • Class InferContext
        • Class InferContext::Input
        • Class InferContext::Options
        • Class InferContext::Output
        • Class InferContext::Request
        • Class InferContext::Result
        • Class InferGrpcContext
        • Class InferGrpcStreamContext
        • Class InferHttpContext
        • Class ModelControlContext
        • Class ModelControlGrpcContext
        • Class ModelControlHttpContext
        • Class ModelRepositoryContext
        • Class ModelRepositoryGrpcContext
        • Class ModelRepositoryHttpContext
        • Class ServerHealthContext
        • Class ServerHealthGrpcContext
        • Class ServerHealthHttpContext
        • Class ServerStatusContext
        • Class ServerStatusGrpcContext
        • Class ServerStatusHttpContext
        • Class SharedMemoryControlContext
        • Class SharedMemoryControlGrpcContext
        • Class SharedMemoryControlHttpContext
        • Class CustomInstance
      • Enums
        • Enum custom_memorytype_enum
        • Enum custom_serverparamkind_enum
        • Enum trtserver_errorcode_enum
        • Enum trtserver_memorytype_enum
        • Enum trtserver_metricformat_enum
        • Enum trtserver_modelcontrolmode_enum
        • Enum trtserver_requestoptionsflag_enum
        • Enum trtserver_traceactivity_enum
        • Enum trtserver_tracelevel_enum
      • Functions
        • Function CustomErrorString
        • Function CustomExecute
        • Function CustomExecuteV2
        • Function CustomFinalize
        • Function CustomInitialize
        • Function CustomVersion
        • Function nvidia::inferenceserver::client::operator<<
        • Function TRTSERVER_ErrorCode
        • Function TRTSERVER_ErrorCodeString
        • Function TRTSERVER_ErrorDelete
        • Function TRTSERVER_ErrorMessage
        • Function TRTSERVER_ErrorNew
        • Function TRTSERVER_InferenceRequestOptionsAddClassificationOutput
        • Function TRTSERVER_InferenceRequestOptionsAddInput
        • Function TRTSERVER_InferenceRequestOptionsAddOutput
        • Function TRTSERVER_InferenceRequestOptionsDelete
        • Function TRTSERVER_InferenceRequestOptionsNew
        • Function TRTSERVER_InferenceRequestOptionsSetBatchSize
        • Function TRTSERVER_InferenceRequestOptionsSetCorrelationId
        • Function TRTSERVER_InferenceRequestOptionsSetFlags
        • Function TRTSERVER_InferenceRequestOptionsSetId
        • Function TRTSERVER_InferenceRequestProviderDelete
        • Function TRTSERVER_InferenceRequestProviderInputBatchByteSize
        • Function TRTSERVER_InferenceRequestProviderNew
        • Function TRTSERVER_InferenceRequestProviderNewV2
        • Function TRTSERVER_InferenceRequestProviderSetInputData
        • Function TRTSERVER_InferenceResponseDelete
        • Function TRTSERVER_InferenceResponseHeader
        • Function TRTSERVER_InferenceResponseOutputData
        • Function TRTSERVER_InferenceResponseStatus
        • Function TRTSERVER_MetricsDelete
        • Function TRTSERVER_MetricsFormatted
        • Function TRTSERVER_ProtobufDelete
        • Function TRTSERVER_ProtobufSerialize
        • Function TRTSERVER_ResponseAllocatorDelete
        • Function TRTSERVER_ResponseAllocatorNew
        • Function TRTSERVER_ServerDelete
        • Function TRTSERVER_ServerId
        • Function TRTSERVER_ServerInferAsync
        • Function TRTSERVER_ServerIsLive
        • Function TRTSERVER_ServerIsReady
        • Function TRTSERVER_ServerLoadModel
        • Function TRTSERVER_ServerMetrics
        • Function TRTSERVER_ServerModelRepositoryIndex
        • Function TRTSERVER_ServerModelStatus
        • Function TRTSERVER_ServerNew
        • Function TRTSERVER_ServerOptionsAddTensorFlowVgpuMemoryLimits
        • Function TRTSERVER_ServerOptionsDelete
        • Function TRTSERVER_ServerOptionsNew
        • Function TRTSERVER_ServerOptionsSetExitOnError
        • Function TRTSERVER_ServerOptionsSetExitTimeout
        • Function TRTSERVER_ServerOptionsSetGpuMetrics
        • Function TRTSERVER_ServerOptionsSetLogError
        • Function TRTSERVER_ServerOptionsSetLogInfo
        • Function TRTSERVER_ServerOptionsSetLogVerbose
        • Function TRTSERVER_ServerOptionsSetLogWarn
        • Function TRTSERVER_ServerOptionsSetMetrics
        • Function TRTSERVER_ServerOptionsSetModelControlMode
        • Function TRTSERVER_ServerOptionsSetModelRepositoryPath
        • Function TRTSERVER_ServerOptionsSetPinnedMemoryPoolByteSize
        • Function TRTSERVER_ServerOptionsSetServerId
        • Function TRTSERVER_ServerOptionsSetStartupModel
        • Function TRTSERVER_ServerOptionsSetStrictModelConfig
        • Function TRTSERVER_ServerOptionsSetStrictReadiness
        • Function TRTSERVER_ServerOptionsSetTensorFlowGpuMemoryFraction
        • Function TRTSERVER_ServerOptionsSetTensorFlowSoftPlacement
        • Function TRTSERVER_ServerPollModelRepository
        • Function TRTSERVER_ServerRegisterSharedMemory
        • Function TRTSERVER_ServerSharedMemoryAddress
        • Function TRTSERVER_ServerSharedMemoryStatus
        • Function TRTSERVER_ServerStatus
        • Function TRTSERVER_ServerStop
        • Function TRTSERVER_ServerUnloadModel
        • Function TRTSERVER_ServerUnregisterAllSharedMemory
        • Function TRTSERVER_ServerUnregisterSharedMemory
        • Function TRTSERVER_SharedMemoryBlockCpuNew
        • Function TRTSERVER_SharedMemoryBlockDelete
        • Function TRTSERVER_SharedMemoryBlockGpuNew
        • Function TRTSERVER_SharedMemoryBlockMemoryType
        • Function TRTSERVER_SharedMemoryBlockMemoryTypeId
        • Function TRTSERVER_TraceDelete
        • Function TRTSERVER_TraceId
        • Function TRTSERVER_TraceManagerDelete
        • Function TRTSERVER_TraceManagerNew
        • Function TRTSERVER_TraceModelName
        • Function TRTSERVER_TraceModelVersion
        • Function TRTSERVER_TraceNew
        • Function TRTSERVER_TraceParentId
      • Defines
        • Define CUSTOM_NO_GPU_DEVICE
        • Define CUSTOM_SERVER_PARAMETER_CNT
        • Define DECLSPEC
        • Define TRTIS_CLIENT_HEADER_FLAT
        • Define TRTIS_CLIENT_HEADER_FLAT
        • Define TRTIS_CLIENT_HEADER_FLAT
        • Define TRTIS_CUSTOM_EXPORT
        • Define TRTSERVER_EXPORT
      • Typedefs
        • Typedef cudaIpcMemHandle_t
        • Typedef CustomErrorStringFn_t
        • Typedef CustomExecuteFn_t
        • Typedef CustomExecuteV2Fn_t
        • Typedef CustomFinalizeFn_t
        • Typedef CustomGetNextInputFn_t
        • Typedef CustomGetNextInputV2Fn_t
        • Typedef CustomGetOutputFn_t
        • Typedef CustomGetOutputV2Fn_t
        • Typedef CustomInitializeData
        • Typedef CustomInitializeFn_t
        • Typedef CustomMemoryType
        • Typedef CustomPayload
        • Typedef CustomServerParameter
        • Typedef CustomVersionFn_t
        • Typedef nvidia::inferenceserver::CorrelationID
        • Typedef nvidia::inferenceserver::DimsList
        • Typedef TRTSERVER_Error_Code
        • Typedef TRTSERVER_InferenceCompleteFn_t
        • Typedef TRTSERVER_Memory_Type
        • Typedef TRTSERVER_Metric_Format
        • Typedef TRTSERVER_Model_Control_Mode
        • Typedef TRTSERVER_Request_Options_Flag
        • Typedef TRTSERVER_ResponseAllocatorAllocFn_t
        • Typedef TRTSERVER_ResponseAllocatorReleaseFn_t
        • Typedef TRTSERVER_Trace_Activity
        • Typedef TRTSERVER_Trace_Level
        • Typedef TRTSERVER_TraceActivityFn_t
        • Typedef TRTSERVER_TraceManagerCreateTraceFn_t
        • Typedef TRTSERVER_TraceManagerReleaseTraceFn_t
  • Python API
    • Client
NVIDIA TensorRT Inference Server
  • Docs »
  • Search


© Copyright 2018, NVIDIA Corporation

Built with Sphinx using a theme provided by Read the Docs.