This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
This is so non-specific to be meaningless.
Like actually tell us what you know so we can make useful decisions about our safety.
The Atlantic, WSJ, The Economist, Politico all come to mind as profitable.
I don’t think it’s anomalous to have a major national newspaper that’s profitable. And WaPo should have been absolutely primed for Trump II given its long time DC focus. They historically had the best political coverage of DC.
If you’re getting started, in say Claude, some pointers that helped me
Stay in plan mode most of the time. It will produce a step by step set of instructions - more context - for the LLM to execute the change. It’s the best place to exert detailed control over what will happen. Claude lets you edit it in a vim window.
Think about testing strategy carefully. Connecting the feedback back into the LLM is what makes a lot of the magic happen. But it requires thought or the LLM might cheat or you get a suboptimal result.
Then with these two you spend your time thinking in terms of product correctness - good tests - and implementation plan - deciding if the LLM has a sane grasp of the problem and will create a sane result.
You’re at a higher level of abstraction, still caring about details, but rarely finicky up to your elbows in line by line code.
If you can get good at these you’re well on your way.
At very high scale, there's less usage of graphs. Or there's a set of clustering on top of graphs.
Graphs can be complex to build and rebalance. Graph-like data structures with a thing, then a pointer out to another thing, aren't that cache friendly.
Add to that, people almost always want to *filter* vector search results. And this is a huge blindspot for consumers and providers. It's where the ugly performance surprises come from. Filtered HNSW isn't straightforward, and requires you to just keep traversing the graph looking for results that satisfy your filter.
HNSW came out of a benchmark regime where we just indexed some vectors and tried to only maximize recall for query latency. It doesn't take into account the filtering / indexing almost everyone wants.
Turbopuffer, for example, doesn't use graphs at all, it uses SPFresh. And they recently got 200ms latency on 100B vectors.