shower algorithm
So a few days ago an algorithm for klargest came to me in the shower and I’ve been having migraines thinking about the problem since.

It started as:

take a sample of the array

deduce the shape of the array

compile a pivot function

based on that pivot function, move elements to the front of the array getting a smaller array

repeat
I was literally considering writing jit asm to make that pivot function for a while during migraines. I highly don’t recommend this sort of process, very not fun.
anyway I eventually settled on a different algorithm that is correct and sane

do a radix count

process the radix count array to get a pivot number of which digit contains k

run thru the array with radix function and filter elements that equal the pivot number to the front of the array and numbers under the pivot number to an output array

repeat on the smaller slice with the next radix function
There are more details to make it work and it does work; but…. it isn’t faster than the existing algorithm its ties with the pivotbased algorithm in d’s std. (I’m literally crying rn, sadpepe.jpg)
The reasoning why it seems to tie would be that while my code should decrease the problem by 1/256(for the unaware the standard size for radix digits once optimized, so for 64bit numbers, you have 4 iterations) it does touch the data twice, while the pivot based algorithm decreases the code in half each iteration and to be sloppy math 1 + 1/2 + 1/4… does equal 2; so yes roughly the same, if you assume that the problem is decreased by 1/256, is trivial and ignorable; which…. yeah.
(again literally crying; hold me… why wasn’t I just magically smarter than everyone else with a few days of thinking with migraines)
But, I have a differently working code for the problem, and I’m imagining ways that maybe are loose on correctness but do work on the insight that a radix digit+number is a good discriminator function
What happens if do a random sample of the array with radix digits and process the count array of this sample with a heuristic of some kind? Assuming I’m ok with being loose on correctness, I could touch the majority of data once, but the complexities of recursive algorithms with template messes made writing the last version hard so I want some outside advice and thoughts on the matter if only to clarify my thoughts into something sane.
What are some known ways to manage multi streams of data as you filtering? If you’re discarding down data it’s possible to have nice space complicity, with radix you can sort nicely; but inbetween seems kinda a mess, the radix sort depends on perfect knowledge of counts as far as I’m aware.
Anyway enough rambling, if anyone has any thoughts wherever my email at monkyyy@cocaine.ninja if you don’t know me someway else.