API Input length 1217 exceeds maximum allowed token size 512 but configured the API parameters to 4096

pawanaditya.man · November 27, 2024, 8:40am

I’m building a Rag using Nvidia meta/llama-3.2-3b-instruct model while passsing the question with context retrived from the vector embedding im gettting this error "Input length 1217 exceeds maximum allowed token size 512’ but where as i have increased the parameter size in the Nvidia site and used a new APi key but still getting the same error. Is there a way to solve this or the API has a token imit of 512 token. Kindly let me know.

Topic		Replies	Views
Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints Technical Blog	10	527	August 28, 2024
Not getting response for this model since yesterday meta/llama-3.1-405b-instruct model AI Foundation Models and Endpoints llama-31-405b-instruct , llama	0	62	December 13, 2024
OpenAI Compatible API does not work Models llama-31-8b-instruct , llama-31-70b-instruct	6	440	August 26, 2024
Llama-3.2-nv-embedqa-1b-v2 402 Payment required Models llama	1	29	June 10, 2025
Assistance Required for API Call Error: Prompt Length Exceeds Maximum Input Length in TRTGptModel Models nim , mistral-7b-instruct-v03	0	100	December 20, 2024
Open AI Endpoint AI Foundation Models and Endpoints	0	203	April 28, 2024
Open AI API Compatible Models llama-31-405b-instruct , llama	0	106	March 13, 2025
Discrepancy in Maximum Token Length for nv-embed-qa-1b-v2 Model Models nv-embedqa-e5-v5 , llama	3	299	February 7, 2025
GenerativeAIExamples->multimodal_rag Access/Accounts	3	226	December 18, 2024
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server Technical Blog	62	3678	August 28, 2024

API Input length 1217 exceeds maximum allowed token size 512 but configured the API parameters to 4096

Related topics