@wyatt Now I'm intrigued. Does your university's data structures class cover implementing a hash table from scratch? Say one that maps either fixed-length or zero-terminated character strings to 32-bit integers. I ask because one thing that the ISO C standard library lacks is any sort of map, dictionary, or associative array data structure. It's not like a dynamic array, where a thin wrapper around realloc() is adequate. (hsearch() is non-ISO and allows only one hash table per process.)

#HashTable #CLanguage #DataStructures #DynamicArray

GitHub - Cranot/grouped-simd-hashtable: High-performance C++ hash table using grouped SIMD metadata scanning. Beats SOTA at scale.

High-performance C++ hash table using grouped SIMD metadata scanning. Beats SOTA at scale. - Cranot/grouped-simd-hashtable

GitHub
The January edition of Communications of the ACM has an article about speeding up the speedy #hashtable. I was a little surprised by this because in #computerscience they already have an O(1) runtime complexity. But I guess they mean the number of steps, not improving the runtime complexity.
#programming
Wednesday Links - Edition 2025-12-24 🎅🎄🎁

Building a Fast, Memory-Efficient Hash Table in Java (by borrowing the best ideas) (15...

DEV Community

🚀 DOMINANDO DICIONÁRIOS PYTHON: O SEGREDO O(1) PARA DSA! 🐍

Cansado de buscas lentas em listas? Descubra como dict vira hash table turbinada: hashing interno, colisões resolvidas, O(1) em inserção/busca/deleção + exemplos reais pra grafos, caches e LeetCode!

Desde Python 3.7: ordem garantida! 💥

👉 Leia o guia técnico completo:
https://bolha.blog/riverfount/dominando-dicionarios-em-python-o-segredo-o-1-para-dsa-eficiente

#Python #DSA #DataStructures #HashTable #Programação #DevOps

Dominando Dicionários em Python: O Segredo O(1) para DSA Eficiente!

Procurando por dicionários Python DSA, hash tables em Python, complexidade Big O dict Python ou estruturas de dados Python avançadas? Nes...

Riverfount

Emu68 v1.1: Más novedades

El desarrollador Michal Schulz continúa detallando las novedades de Emu68 v1.1, centradas esta vez en la gestión de la caché y en un nuevo mecanismo llamado Dumpster, diseñado para mejorar de forma notable el rendimiento en escenarios donde las frecuentes limpiezas de caché penalizaban la velocidad.

En versiones anteriores, cada vez que AmigaOS ejecutaba un LoadSeg —acción habitual en cada arranque del sistema o al cargar nuevos programas—, la caché de instrucciones se vaciaba y Emu68 se veía obligado a recompilar los bloques JIT desde cero. Aunque en el uso normal este impacto era asumible, el problema se agravaba con la emulación de MacOS mediante ShapeShifter o Fusion, ya que este sistema operativo fuerza decenas de limpiezas por segundo. El resultado: lentitud generalizada y una experiencia inferior incluso a la de aceleradoras clásicas 68040 o 68060.

Schulz observó que, en la mayoría de los casos, la memoria no cambia tras un vaciado de caché. De ahí surgió la idea de no descartar los bloques compilados, sino almacenarlos temporalmente en un “contenedor” (Dumpster) junto con un checksum CRC32 de la zona de memoria asociada. Si el contenido permanece inalterado, el bloque puede reutilizarse sin necesidad de recompilación, recuperándose en apenas microsegundos.

La primera aproximación utilizaba una segunda tabla hash, pero implicaba recortes en el tamaño de la caché y mayores colisiones, lo que reducía la velocidad en otros contextos. Finalmente, la solución definitiva llegó con la introducción del concepto de Epoch: un contador global que marca la validez temporal de los bloques JIT.

  • Cuando se vacía la caché, basta con incrementar el valor de Epoch.
  • Cada bloque JIT conserva un número de Epoch asociado.
  • Si su checksum es válido y el Epoch coincide con el actual, puede reutilizarse de inmediato.
  • El uso combinado del hash table y la caché LRU garantiza búsquedas más rápidas y eficientes.

Este sistema evita movimientos de datos innecesarios, mantiene la tabla hash en su tamaño completo y reduce drásticamente la pérdida de rendimiento en situaciones críticas como la emulación de MacOS.

Con estas mejoras, AmigaOS y MacOS arrancan de forma significativamente más rápida y la utilización de la caché JIT es mucho más eficiente. Además, el nuevo enfoque libera un núcleo extra de la Raspberry Pi para futuros proyectos, como la posible emulación de PowerPC dentro del mismo entorno.

Aunque el Dumpster todavía está en fase de refinamiento, las pruebas iniciales no muestran efectos adversos y confirman un avance importante en la experiencia de usuario con Emu68. De hecho, MacOS 7.5 ya puede iniciarse en apenas unos segundos, un hito impensable en versiones previas.

https://www.youtube.com/watch?v=JNZGNLweUyQ

#amigaos #arranqueRápido #caché #CRC32 #Dumpster #Emu68V11 #emulación #Epoch #Fusion #hashTable #jit #LoadSeg #LRUCache #macos #MichalSchulz #powerpc #raspberryPi #recompilación #rendimiento #ShapeShifter

Undergraduate Upends a 40-Year-Old Data Science Conjecture : programming – Andrew Krapivin et all invente a faster hashing algorithm

From a while back: [Wayback/Archive] Undergraduate Upends a 40-Year-Old Data Science Conjecture : programming which has a “TL;DR for non CS people” and a “Here’s an explanation” well worth reading.

It’s about the work of Andrew Krapivin with co-authors Martín Farach-Colton and William Kuszmaul.

A young computer scientist and two colleagues show that searches within data structures called hash tables can be much faster than previously deemed possible.

Reminder to self to find any real world implementations of this new hashing algorithm.

Materials are the “easier” article [Wayback/Archive] Undergraduate Upends a 40-Year-Old Data Science Conjecture | Quanta Magazine which refers to the actual paper:

The discovery feels kind of similar to the George Dantzig story two open problems in statistical theory which he had mistaken as homework.

Note that, for me, the “easiest” explanation still is in the Reddit post I referenced at the top.

Via:

  • [Wayback/Archive] Wladimir Mufty: “Sometimes a fresh perspective …” – SURF Mastodon Pilot

    Sometimes a fresh perspective leads to revolutionary insights 💚. Without intending to Andrew Krapivin, as an undergraduate student 👨‍🎓, turned around the 40 years old paradidigm on #hashtable lookups 🗄️!

    After the #DeepSeek results, another great example that #innovation does not always come from business cases or throwing billions 💰 and energy resources into the game. Sometimes it simply comes from curiosity and a different way of looking at a challenge!

  • [Wayback/Archive] ShīnChvën – Revolutionizing Hash Tables: An Undergraduate’s Breakthrough
  • Query: [Wayback/Archive] Andrew Krapivin faster hash table implementation – Google Search

    --jeroen

    #DeepSeek #hashtable #innovation

    Undergraduate Upends a 40-Year-Old Data Science Conjecture

    Posted in r/programming by u/Stackitu • 478 points and 65 comments

    reddit
    #Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of #HashTable
    A young computer scientist and two colleagues show that searches within data structures called #hashtables can be much faster than previously deemed possible.
    https://www.wired.com/story/undergraduate-upends-a-40-year-old-data-science-conjecture/
    https://archive.ph/2Tl3q
    Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of Hash Table

    A young computer scientist and two colleagues show that searches within data structures called hash tables can be much faster than previously deemed possible.

    WIRED

    Still working on #swad, and currently very busy with improving quality, most of the actual work done inside my #poser library.

    After finally supporting #kqueue and #epoll, I now integrated #xxhash to completely replace my previous stupid and naive hashing. I also added a more involved #dictionary class as an alternative to the already existing #hashtable. While the hashtable's size must be pre-configured and collissions are only ever resolved by storing linked lists, the new dictionary dynamically nests multiple hashtables (using different bits of a single hash value). I hope to achieve acceptable scaling while maintaining also acceptable memory overhead that way ...

    #swad already uses both container classes as appropriate.

    Next I'll probably revisit poser's #threadpool. I think I could replace #pthread condition variables by "simple" #semaphores, which should also reduce overhead ...

    https://github.com/Zirias/swad

    #c #coding

    GitHub - Zirias/swad: Simple Web Authentication Daemon

    Simple Web Authentication Daemon. Contribute to Zirias/swad development by creating an account on GitHub.

    GitHub

    How to handle a hash table?

    “You didn’t just come up with a cool hash table ... You’ve actually completely wiped out a 40-year-old conjecture!”

    #research #HashTable

    https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/

    Undergraduate Upends a 40-Year-Old Data Science Conjecture | Quanta Magazine

    A young computer scientist and two colleagues show that searches within data structures called hash tables can be much faster than previously deemed possible.

    Quanta Magazine