#kdtree and ball trees seem cool, but require full knowledge of the thing I'm searching for. What if it's 7 dimensional and I only know 4 of the values?

I feel like a "parallel kd tree" with a separate binary index on each dimension would work better here.

Reduce depth. Allow unspecified values. It'd also be a snap to create and search each dim in parallel.

This must already exist...

#computerscience #math

Last week I added a feature to my mini sub-project that I now realize will only be usable if I can search a 4-dimensional subspace of a medium-large 7-dimensional space in a fraction of a second in #python

I want that feature. The feature will be useful. I already semi-promised the feature to Operations.

I can't actually do that #search even remotely that fast

I had kind of forgotten this #kdtree -like #algorithm idea. Maybe I can make that work...

#software #engineering

@davidr why do you have the subspace? Also, are your 4D vectors binary by any chance? Then you could look at " Multi-Index Hashing" implemented in OpenCV. https://docs.opencv.org/4.10.0/d2/dde/classcv_1_1line__descriptor_1_1BinaryDescriptorMatcher.html
OpenCV: cv::line_descriptor::BinaryDescriptorMatcher Class Reference

@marcorobotics here we go!

k separate binary trees makes sense to me and I don't get why you'd mix the indexes level by level. I guess so you can....slowly spiral in "breadth first"....?

Not sure how well this implementation couples to my data, but knowing it isn't obviously stupid helps a lot

@davidr "it's 7 dimensional and I know 4 of the values" = "it lies within this bounding hypercube". you're probably not going to be able to do much better than find the LCA of the hypercube in the kd-tree and do an O(n) search of the subtree rooted at that node.
@davidr if the dimensions you know are the first 4 you could also use a sorted array of morton codes. the values of those 4 dimensions are a common prefix so you can just do a single binary search and a linear scan

@bob Part of my hesitance is that I'm not that familiar with these data structures. My understanding is that the kd tree splits first on dim 1, then dim 2....k, then dim 1 again, etc, until max depth/min node size is reached.

If I'm missing some dimensional values, how do I bridge those gulfs and continue on into further subtrees of the dims I do have?

Maybe that's what "LCA of the hypercube" means.

@davidr @recursive I don’t have a library to offer you, but I did some exploration around this. In the more general case, you can weight each of your seven dimensions. If you want something, that’s exactly like four of them, but you have a lot of flexibility on the other three, it can represent that.

I kind of feel that’s what Spotify’s rec algo does sometimes. “ here’s a song by the same artist. Now here’s a song made in the same year. Now here’s a song with the same word in the title.”

@brohrer @recursive Yeah, they probably do. Which reminds me, a tree like this would also be easy to add dimensions without recomputing anything else.

My actual application is 3D positions + time, with optional velocities. More specifically, it'd be 2D pos + time all as ranges, a third pos as a disjoint set of very small ranges and optional velocity ranges.

Interesting that nobody named something that does this. Possibly it's just an obvious neighbor of KD trees and regular old DB tables.

@davidr @recursive it sounds like you’re working with a very specific application, possibly hardware. I’ve been surprised how many named methods fall short of what’s needed to solve actual problems, and how low the fruit is to be plucked for doing new and useful work once you get your hands dirty with an application.
@brohrer @recursive That's motivating and terrifying! "Maybe I'll discover something....IF I HURRY!"