No #Zobrist hash collisions were found in unique #chess positions from lichess games (2013-2014). The study will expand to 2023. Despite optimizing #SQLite for the expanding data, import times remain high, prompting a potential switch to #PostgreSQL for parallel import. Final results will be in #SQLite format. #jja
The #Zobrist hash function used in Polyglot opening books is roughly 2.4 times faster than the #Zobrist hash #Stockfish uses for their transposition table. One reason is because the former uses a constant array of random u64 numbers to generate randomness whereas the latter uses a #PRNG with a constant seed. Also former requires pseudo-legality for en-passant whereas latter requires full-legality. #jja #chess
% for p in lichess_db_standard_rated_20*.pgn.zst; do echo "-- ${p} --"; jja dump "$p" | jja restore /caissa/pgn/alip.jja || break; done #yolo #jja #chess #sqlite #lichess restore returns non-zero when a hash collision is detected so break means collision. This way we hope to detect at least one #Zobrist hash collision. We have roughly 4,5T free space atm.
For purely academic reasons, I have added some tooling around #jja, to compile a huge #Zobrist hash to #chess position database which is structured such that we're able to detect hash collisions. I have indexed 100mio unique positions so far. No collisions found. How many #Zobrist hashes is it going to take to spot a collision I wonder?