Accuracy concerns and Deployment references for the VILA model

GitHub - NVlabs/VILA: VILA is a family of state-of-the-art vision language…

main

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud. - NVlabs/VILA

used above as reference and made inference to my ec2 instance, where the results itself is not accurate basically and vila API providing by nvidiaI(vila Model by NVIDIA | NVIDIA NIM) which is outperforming . So to get the same kind of responses like vila api performed what should we can do even we didn’t seen cosmos nemotron 34b model nim deployment resources directly from anywhere.