Code to try this yourself is here: https://github.com/ConsistentlyInconsistentYT/Pixeltovoxelprojector
Twitter: https://x.com/ConsistInconsis
Fact checking: Is this new? For the most part, yes. Pretty much the closest I could find is something like how they detect tennis balls using the similarity of pixels to the neon yellow color of the tennis balls, not the motion. This only works for them because of how large tennis balls are on their cameras because they have much closer higher resolution cameras. Cost is so low because you don't have to place the cameras to be aligned into one voxel grid, this means that they can be scattered around in random orientations and count to many different voxel grids, massively reducing the amount of cameras needed. Even the amount of data you need to send over to the central voxel processor isn't that high since you can just stack the images on top of each other and send over the "long exposure" and that's before you do things such as processing the image and voxel grid to recognize regions of interest to send those over in higher quality. It also really doesn't take that many images to recognize the f35s in the first place since I was able to confidently do it with only maybe 10-20 1920x1080 images (depends on how you want to determine when you have identified it.) which really isn't much data at all.
The images were rendered using raytracing in blender as I unfortunately (or fortunately) don't have any f35s flying overhead. However this is how they would really look like so it made generating images easier, there is absolutely nothing telling the algorithm that there is an object there other than what is in the RGB color values of the images.
I used footage from @PosyMusic thank you very much for this awesome video on motion extraction, it was a major inspiration.
I also used footage from @scottmanley and @veritasium fly safe.