google cloud platform – How Vertex AI rate limits are calculated on GCP?
I’m planning to use Google Cloud Platform’s Vertex AI for a few projects. So, I was looking through the documentation in the section on rate limits and I came across this:
cloud.google.com/vertex-ai/generative-ai/docs/quotas
But I haven’t found any information anywhere about the algorithm that sets these limits.
That is, I have two scenarios in my mind:
- First scenario: The limits are at fixed times. For example, between
08:00:00 AM and 08:00:59 AM there are 4 million tokens available and
at 08:01:00 AM the tokens are reset. - Second scenario: The limits move as requests are made.
Or maybe it’s different from the scenarios outlined.
I would appreciate if someone could explain to me how Google calculates it, or if there is a section of the documentation where I can find this since I haven’t seen it.
Read more here: Source link

