Binary Representation + Hamming Distance 時的搜尋演算法

上個月看到「Hamming Distance for Hybrid Search in SQLite (via)」的時候引起的興趣,有些 embedding model 在訓練時有對 binary representation 調過的,即使把 fp32 或是 fp16 (或是 bf16) 的 embedding 降到 binary representation 後用 hamming distance 仍然有很好的...

Gea-Suan Lin's BLOG

Here's an informal and (hopefully simple to understand) by-example derivation of "next larger integer with same popcount" implementation that saturates to known when none exists. From there builds "closest smaller integer", "at runtime selected direction" and shows the connection to recent post by @harold that walks to "closest". If I didn't screw up it should be UB free.

https://marc-b-reynolds.github.io/math/2023/11/09/PopNextPrev.html

#bithack #popcount

Popcount walks: next, previous, toward and nearest

Informal (and by-example) derivation of walking adjacent integers with the same population count as the input.

Marc-B-Reynolds.github.io

My insufferable brain keeps repeating to me (for the past week) that if 'x' has popcount 'p' then bit_gather(~0,x) is the integer with the lowest 'p' bits set.

Maybe this post will make it stop.

(edit: PEXT on intel)

#bithack #popcount

New blog post: "Reducing 'gate' counts for Kyber-512: Two algorithm analyses, from first principles, contradicting NIST's calculation." https://blog.cr.yp.to/20231023-clumping.html #xor #popcount #gates #memory #clumping Also via Cloudflare given the frequent DoS attacks: https://blog-cr-yp-to.viacache.net/20231023-clumping.html
cr.yp.to: 2023.10.23: Reducing "gate" counts for Kyber-512