We’re All So F’d | NVIDIA x Palantir, Global Surveillance, "Pre-Crime" Arrests, & AI

YouTube

How to Create an Offline Version of Websites Using Kiwix and ZIM Files

#zim #kiwix #offline #SelfHosting #DataHoarding #WebScraping

https://dev.to/free_programmers/how-to-create-an-offline-version-of-websites-using-kiwix-and-zim-files-3d9b

> Access web content without the Internet — wherever you go.

https://wiki.openzim.org/wiki/Zimit

https://github.com/openzim/zimit

⚙️ Optional: Create Your Own ZIM File

Don’t see the website you want in the Kiwix library? No problem!

Use Zimit to generate your own ZIM file.

Steps:

https://noted.lol/convert-any-website-into-a-zim-file-zimit/

💡 Note: Not all websites are easily portable to ZIM format, especially dynamic sites with login systems or lots of JavaScript.

RE: https://mastodon.social/@Migueldeicaza/115713071946346627

Holy crap. What a case study in why it is important to decouple data and devices from corporate monopolies!

#apple #FOSS #fosstodon #datahoarding #homelab

Data Hoarding - FAQ

An index of resources and archives related to data hoarding, web archival and digital preservation.

I'm setting up my fileserver again. Before I used MergerFS and 4x4tb NAS drivers, to aggregate the files on all the drives as one mountpoint. The nice thing about mergerfs is that the disks are completely separate (files are written whole to each disk - nice !). The data is not important at all to me and backed up so I don't need RAID etc. Any suggestions of an alternative approach? I hear ZFS pools are interesting, but never used zfs. #linux #mergerfs #zfs #datahoarding

Quite happy to have found a few utilities that are helpful in sifting through a full storage device.

- NCDU 'NCurses Disk Usage'
https://dev.yorhel.nl/ncdu

- ripgrep & ripgrep-all
https://github.com/phiresky/ripgrep-all

Thanks @fschaap for the suggestions to find duplicate data. Looking into them. In addition I also found fslint.

Thanks @nicorikken for the idea of having a shell script to monitor changes & a trigger for cleaning up. Will try this as well.

#Linux #Tech #Data #DataHoarding #Cli #Sysadmin

NCurses Disk Usage

BitTorrent's DHT and the Leading ISP Networks Helping to Keep it Alive * TorrentFreak

Based on the volume of IP addresses seen in the network, customers of relatively few ISPs dominate BitTorrent's Distributed Hash Table.

Bleeping Computer: American Archive of Public Broadcasting fixes bug exposing restricted media. “​A vulnerability in the American Archive of Public Broadcasting’s website allowed downloading of protected and private media for years, with the flaw quietly patched this month. BleepingComputer was tipped about the flaw by a cybersecurity researcher who asked to remain anonymous, stating that the […]

https://rbfirehose.com/2025/09/24/bleeping-computer-american-archive-of-public-broadcasting-fixes-bug-exposing-restricted-media/

Bleeping Computer: American Archive of Public Broadcasting fixes bug exposing restricted media | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

#wikipedia #datahoarding

whoops, spent all night downloading Wikipedia articles to read on my offline Kindle Paperwhite: level 1 and 2 vital articles (https://en.wikipedia.org/wiki/Wikipedia:Vital_articles), and a lot of random stuff on computing/psychology/politics

I picked pre-ChatGPT revisions (2022-11-29 or earlier) for shits n giggles but Wikipedia's built-in PDF downloads were broken for older article revisions ime, so I had the choice of using wget or SingleFile. I picked the former and just cleaned my filenames