google cloud platform – What would be the best strategy if you need to partition a table by a field, but the resulting number of partitions exceeds the limit?

At load time, synthesize a column that holds a partial account id that fits within the partition limit.

For example: if account_id is a number in the range 100,000 to 150,000, divide it to give ~10K values such as (account_id / 5) as account_set. That will give you values with the range 20,000 to 30,000.

Not, if you have non-numeric account_ids, this will require a deterministic mapping to a numeric value for the account_set.

Create your table as:

CREATE TABLE account_data (..., account_id, account_set, ...)
PARTITION BY account_set
CLUSTER BY account_set, account_id

When you query, specify the account_set and account_id in the predicate. BigQuery will select the partition using the account_set and use the account_set, account_id to select the subset of records from the partition.

One caveat is partition_expiration_days will affect all accounts that fall in that partition. You’ll need a different approach if you need to expire data.

Read more here: Source link