So, decomposing a large DFT into smaller DFTs, where some computations are redudant, is exactly the trick the Cooley-Tukey algorithm uses. So, your FFT is already doing that internally, and I doubt you’ll write a better FFT. 4 parallel FFTs aren’t sufficient – you need a butterfly structure afterwards to combine them again. (your considerations with the four 250 MHz streams¹ spots the fact that your original signal has a bandwidth of 1 GHz, so if you just do an FFT on every fourth sample, you’ll have aliasing; the butterfly effectively is a mathematical way to cancel that aliasing using the info from the other three FFTs.)
You might want to read the Radix-2 section of the wikipedia page on the Cooley-Tukey algorithm.
¹ what’s wrong about your consideration is that your Nyquist bandwidth == sampling rate for all practical purposes, since the DFT deals with complex signals, not only real signals.
Read more here: Source link