Use of int8 for variant_contig results in integer overflow with fragmented reference genomes

We should make it an option to specify the dtype for variant_contig probably – even int16 will overflow sometimes. There are lots of VCFs out there with huge numbers of contigs.

Although, I guess this is the sort of thing we should be able to query the IO library for (“how many contigs are there” should be efficiently computable on any indexed VCF), so we should be able to automatically detect the minimal dtype. Even then though, I suppose people might want to manually specify the dtype, for their own reasons.

Read more here: Source link