I’m building a Rag using Nvidia meta/llama-3.2-3b-instruct model while passsing the question with context retrived from the vector embedding im gettting this error "Input length 1217 exceeds maximum allowed token size 512’ but where as i have increased the parameter size in the Nvidia site and used a new APi key but still getting the same error. Is there a way to solve this or the API has a token imit of 512 token. Kindly let me know.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints | 10 | 527 | August 28, 2024 | |
Not getting response for this model since yesterday meta/llama-3.1-405b-instruct model | 0 | 62 | December 13, 2024 | |
OpenAI Compatible API does not work | 6 | 440 | August 26, 2024 | |
Llama-3.2-nv-embedqa-1b-v2 402 Payment required | 1 | 29 | June 10, 2025 | |
Assistance Required for API Call Error: Prompt Length Exceeds Maximum Input Length in TRTGptModel | 0 | 100 | December 20, 2024 | |
Open AI Endpoint | 0 | 203 | April 28, 2024 | |
Open AI API Compatible | 0 | 106 | March 13, 2025 | |
Discrepancy in Maximum Token Length for nv-embed-qa-1b-v2 Model | 3 | 299 | February 7, 2025 | |
GenerativeAIExamples->multimodal_rag | 3 | 226 | December 18, 2024 | |
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server | 62 | 3678 | August 28, 2024 |