Datastructure for intervals across 64-bits or 128-bits??

https://lemmy.world/post/27759427

Datastructure for intervals across 64-bits or 128-bits?? - Lemmy.World

I’m doing some Galois Field / Cyclic Redundancy Check research for fun and I’ve come across an intriguing pattern that I need a data structure for. Across the 64-bit (or even 128-bit or larger) spaces, I’ve discovered an interesting pattern relating to hamming distances that I’d like a data structure to represent. I’m going to need something on the order of ~billions of intervals each having somewhere between 1 item to ~1 billion per interval. And I’d like to quickly (O(1) or O(lg(n))) determine if other intervals intersect. ------ For 32-bit space I can simply make a 512MB Bitmask lol and then AND/OR the two Bitmask. Easy But for 64-bit space I’m stuck and a bit ignorant to various data structures. I’m wondering if someone out there has a good data structure for me to use? I’ve read over Interval Trees on Wikipedia. I’m also considering binary decision diagram over the 64-bits actually. Finally I’m thinking of some kind of 1-dimension octtree like datastructure (is that just a binary tree?? Lol. But BVH trees in 3d space seems similar to my problem it’s just I need it optimized down to 1 dimension rather than 3.) Anyone else have any other ideas or cool data structures that might work?

Ph-trees can do range and closest queries across N dimensions very quickly. I have not used it for 1 dimension, but I’d imagine it would work fine.

github.com/tzaeschke/phtree

GitHub - tzaeschke/phtree: PH-Tree

PH-Tree. Contribute to tzaeschke/phtree development by creating an account on GitHub.

GitHub

I have not used it for 1 dimension

Querying for all intervals (x’, y’) that overlap with an interval (x, y) is a two dimensional range query that selects the upper left quadrant of the x-y plane that satisfies x < y’ && x’ < y

Great answer!!

After thinking about all this for a while, I’ve gone with the basic binary tree (leaning towards AVL tree as I expect my use case to be read heavy).

In my use case, multiple ‘intervals’ can merge together without major penalty (and should be merged together). It looks like a lot of these interval trees (including ph trees) are best when the intervals need to be kept separate.

There is a part of my algorithm where ph trees might be useful though. I’ll have to give it some though.