🤡 Behold the groundbreaking #tszip, the text compressor that’s so advanced it requires a #supercomputer to run slower than your grandpa on dial-up! 😅 It claims to compress text like no other, provided you stick to the English language and have a spare RTX 4090 lying around for those lightning-fast 1 MB/s speeds. 🌟 Just don’t expect it to work with anything else, like, you know, the rest of the world. 🌎
https://www.bellard.org/ts_zip/ #textcompression #RTX4090 #techhumor #HackerNews #ngated
ts_zip: Text Compression using Large Language Models

#APLQuest 2014-02: Write a function that takes a character vector and removes the interior vowels from each word. (https://apl.quest/2014/2/ to test your solution)

#APL #TextCompression #StringManipulation

APL Quest 2014-2: How Tweet It Is

Write a function which takes a character vector and removes the interior vowels from each word.

🤖⚙️ Oh boy, another groundbreaking #GitHub project promising to revolutionize the world by... *checks notes*... compressing text with #AI. 🙄 Just what the tech world needed: more buzzwords and a flashy menu toggle. 🚀💥
https://github.com/deepseek-ai/DeepSeek-OCR #Innovation #TechBuzz #Projects #TextCompression #HackerNews #ngated
GitHub - deepseek-ai/DeepSeek-OCR: Contexts Optical Compression

Contexts Optical Compression. Contribute to deepseek-ai/DeepSeek-OCR development by creating an account on GitHub.

GitHub

So @rl_dane introduced #bzip3 to me to use instead of #bzip2. Let's turn some bz2 files into bz3 to see the difference.

First example: 90k opus files

hey snips wake word dataset. It has ~90k opus files and a tar file of 3.1GB. bzip2 produces the same 3.1GB which is as expected. bzip3 created 3.0GB but used tons of computation power. Not worth the 100MB

Second example: Windows 7 virtual box VM image

Windows7.vdi it's Windows 7 VM image for the "special" days. I think I have to get rid of it. But while it is still there, let's see how each will perform. It is 16GB uncompressed. bzip2 -9 is 7.0GB. bzip3 is 6.3GB but at the expense of like 3x CPU time. Deleting all of them anyway. Down with Windows.

Third example: Pure XML text file

Pure XML file. It's Persian and English characters. Uncompressed is 1.7GB. bzip2 -9 is 276M while bzip3 is 260MB

Final example: Creating a simple bomb

So I did this:

dd if=/dev/zero of=./justzero bs=2G count=6

So now I have a 16GB with only zero bytes. bzip2 -9 is 672KB. bzip3 is 46KB.

Conclusion

Thank you @rl_dane

Real nice thing!

#compression #gzip #zip #filecompression #textcompression #datacompression #linux #unix #tech

So I was short on storage on my archive drive. I saw librewolf source code. It was tar.gz and ~800MB. I uncompressed it then recompressed it with bzip2 -9 and now it's ~600MB. Generally #bzip2 has better compression for such these data than #gzip.

Edit: But don't do bzip2 -9 all and everywhere. Sometimes -4 is the same as -9 however the latter being tons slower. Also there is pbzip2 for using all your CPU cores.

#compression #compressionalgorithm #textcompression #opensource #tech #linux #unix #bsd #freebsd

🚫🔒 In today's episode of "I Can't Read This," we're presented with a groundbreaking revelation: text compression is hard when you can't even access the content! 🙈🔐 But hey, blame Rust for your server's secretive tendencies. #404NotFoundComedy
https://palaiologos.rocks/posts/rust-codecs/ #ICan'tReadThis #TextCompression #RustLang #ServerSecrets #404NotFound #HackerNews #ngated
Why do I find Rust inadequate for codecs?

With memory safety becoming more important in the world of computer science, I offer a brief opinion piece on why I choose to implement codecs in C.

Palaiologos
Text Compression Gets Weirdly Efficient With LLMs

It used to be that memory and storage space were so precious and so limited of a resource that handling nontrivial amounts of text was a serious problem. Text compression was a highly practical app…

Hackaday