Ah, the riveting tale of data compression—where bytes come to die and Kindle formats get thrown around like confetti. 🤓💻 Apparently, this tome is free as long as you don't get capitalism involved, because sharing information is only cool if you're not making a dime. 🙃📚
https://mattmahoney.net/dc/dce.html #datacompression #freeinformation #kindleformats #techhumor #sharingknowledge #HackerNews #ngated
Data Compression Explained

Data Compression Explained

Data Compression Explained

📰 Oh, look! A 12-minute read on how #TimescaleDB compresses data, because obviously, everyone has time to delve into the wonders of #Hypercore and Columnar Storage. 🤔 But hey, if watching paint dry isn't your style, imagine the thrill of squeezing data "up to 98" – whatever that means! 🙄
https://roszigit.com/en/blog/timescaledb-compression-hypercore #DataCompression #ColumnarStorage #TechTrends #HackerNews #ngated
TimescaleDB Compression: Hypercore and Columnar Storage with up to 98% Ratio in PostgreSQL

TimescaleDB compression with hypercore - columnar storage with up to 98% ratio in PostgreSQL. segmentby/orderby configuration and a benchmark for IoT and time-series.

PivCo-Huffman

Draft program of IFIP SEC '26 is there: https://ifipsec.org/program.html

We will present our work on AMPhitryon (a covert channel amplification (and general data compression!) technique, cf. https://github.com/cdpxe/AMPhitryon ).

#infosec #cybersecurity #compression #datacompression

IFIP TC11 SEC Conference 2026, 09-11 June, Perth, Australia

IFIP SEC conferences are the flagship events of the International Federation for Information Processing (IFIP) Technical Committee 11 (TC11) on Information Security and Privacy Protection in Information Processing Systems. The IFIP SEC conferences aim to bring together primarily researchers, but also practitioners from academia, industry and governmental institutions to elaborate and discuss IT Security and Privacy Challenges that we are facing today and will be facing into the future. Join us for our next event.

All of human cooking compressed into 2 megabytes

https://arxiv.org/abs/2605.22391

#HackerNews #Tech #DataCompression

Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings

We present Epicure, a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus. We aggregate 4.14M recipes from 11 sources spanning seven languages, English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian-English, and normalise the raw ingredient strings to 1,790 canonical entries via an LLM-augmented pipeline. A 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph, 2,247 typed compound nodes across 15 categories, seed three Metapath2Vec variants that share architecture and hyperparameters and differ only in the random-walk schema: Cooc walks the co-occurrence graph only, Chem walks the typed compound metapaths only, and Core blends both via injected ingredient-ingredient walks at controlled mixing, placing each model at a distinct point on the chemistry-vs-recipe-context spectrum.

arXiv.org
WinRAR archiver, a powerful tool to process RAR and ZIP files

WinRAR provides the full RAR and ZIP file support, can decompress CAB, GZIP and other archive formats

7-Zip 26.01 - Linux huge pages provide a solid 2.5–4.5% compression speedup on modern and cache-limited CPUs by reducing TLB overhead, but offer zero benefit for decompression or ancient hardware. #memorymanagement #x86 #hugepages #largepages #7zip #linux #compression #datacompression #benchmark #performance

KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

https://arxiv.org/abs/2604.15356

#HackerNews #KVCache #Compression #TurboQuant #ShannonLimit #DataCompression

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

Recent work on KV cache quantization, culminating in TurboQuant, has approached the Shannon entropy limit for per-vector compression of transformer key-value caches. We observe that this limit applies to a strictly weaker problem than the one that actually matters: compressing the KV cache as a sequence. The tokens stored in a KV cache are not arbitrary floating-point data -- they are samples from the exact formal language the model was trained on, and the model is by construction a near-optimal predictor of that language. We introduce sequential KV compression, a two-layer architecture that exploits this structure. The first layer, probabilistic prefix deduplication, identifies semantically equivalent shared prefixes across sessions using the trie metric d_T(s, s') = -log_2 P_M(s ^ s') from Probabilistic Language Tries (PLTs). The second layer, predictive delta coding, stores only the residual of each new KV vector from the model's own prediction of it, achieving a per-token entropy bound of H(KV_{i+1} | KV_{<=i}) <= H(token_{i+1} | token_{<=i}). We prove that at typical language model perplexity -- approximately 10-20 for fluent English text -- this bound is 3.3-4.3 bits on average per token position, compared to TurboQuant's 3 bits per vector component (with typical attention heads having 64-128 components). The theoretical compression ratio over TurboQuant is approximately 914,000x at the Shannon limit. Even at 1000x above the entropy floor -- a deliberately pessimistic worst-case overhead, two orders of magnitude above the 2-5x typical of practical source coders -- the ratio remains approximately 914x over TurboQuant, with compression improving rather than degrading as context length grows. The two layers are orthogonal and compose with existing per-vector quantization methods including TurboQuant.

arXiv.org