Dynamic shared quota (DSQ) was introduced to serve your pay-as-you-go (PayGo) requests with greater flexibility to adapt to your workload needs without having to manage quotas and quota increase requests (QIR). DSQ serves incoming requests by distributing available PayGo capacity among customers for a specific model and region. Your requests are served as long as capacity is available without any preset quota limit.
Supported models
The following Gemini models and their supervised fine-tuned models support DSQ:
The following legacy Gemini models support DSQ:
- Gemini 1.5 Pro
- Gemini 1.5 Flash
How DSQ works
Dynamic shared quota (DSQ) adapts to your traffic patterns and needs without a preset quota and serves your requests as long as capacity is available. With DSQ, you don't submit a quota increase request (QIR) whenever traffic increases, because there is no quota that might throttle your requests.
To prevent large traffic spikes sent by a few customers from interfering with other customers sending smaller and more steady traffic, DSQ adopts a traffic control mechanism by setting a tokens per second (TPS) threshold at the organization level. This TPS threshold is different from standard quotas, and doesn't automatically throttle requests above the threshold. Instead, DSQ sets different priorities for requests depending on whether they are within or above the TPS threshold. Therefore, traffic spikes beyond the TPS threshold won't interfere with the requests within the threshold.
Gemini requests with multi-modal inputs are subject to the corresponding system rate limits that include image, audio, video, and document.
To help ensure high availability for your application and to get predictable service levels for your production workloads, see Provisioned Throughput.
What's next
- To learn about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
- To learn more about Google Cloud quotas and limits, see
Understand quota values and system limits.