Hi, putting this here:
https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen
The latency & throughput increase is significant though the comparisons are against vLLM. It seems like TRT does batching a bit differently so unsure if this can equally apply here.