google cloud platform – How Vertex AI rate limits are calculated on GCP?

I’m planning to use Google Cloud Platform’s Vertex AI for a few projects. So, I was looking through the documentation in the section on rate limits and I came across this:

enter image description here

cloud.google.com/vertex-ai/generative-ai/docs/quotas

But I haven’t found any information anywhere about the algorithm that sets these limits.
That is, I have two scenarios in my mind:

  • First scenario: The limits are at fixed times. For example, between
    08:00:00 AM and 08:00:59 AM there are 4 million tokens available and
    at 08:01:00 AM the tokens are reset.
  • Second scenario: The limits move as requests are made.

Or maybe it’s different from the scenarios outlined.

I would appreciate if someone could explain to me how Google calculates it, or if there is a section of the documentation where I can find this since I haven’t seen it.

Read more here: Source link