Sending a network request for every single query is burning money. I looked at how using Bloom Filters can lead to a 96% reduction in network overhead. You trade a tiny fraction of certainty for massive scale. This is the math that keeps high-traffic platforms from melting down. https://zhach.news/save-money-with-big-data-part-3/

How to Save Money with Big Data: Finding Matches (Part 3)
The power of a probabilistic approach is clear: in an age of petabyte-scale data, the right approximation is often the only possible solution. How has the ability to get a "good enough" answer in real-time changed the way you approach software architecture or data analysis?