RE: https://wisskomm.social/@ioer/115899330915763542

I really took a deep dive into #datashader with this map: Locals & Tourists in Germany, as derived from 67 Million Geo-Social Media Posts (2007-2022) in Germany. The data includes public shared posts from Instagram, Flickr, Twitter and iNaturalist.

I always wanted to create such a map, following the footsteps of Eric Fisher's Locals & Tourists dataset from 2011 [1].

I shared the code for producing this map here [2]. The repository is available here [3]. This includes some neat methods for various #geospatial processing tasks in #Python, such as exporting a datashader map to a #GeoTiff [4] with the help of #Xarray and #Rasterio.

Finally, all of this was created in a privacy-preserving way using #HyperLogLog, which allowed me to share the code and abstracted data publicly for full reproducibility and transparency. [6] #FAIR

Below you'll find the link to the (quite succinct) publication in Natur und Landschaft in Karten (#NuL).

[1]: https://www.flickr.com/photos/walkingsf/albums/72157624209158632
[2]: https://code.ad.ioer.info/wip/digital_traces_map/html/03_visualization.html
[3]: https://gitlab.hrz.tu-chemnitz.de/ad/digital_traces_map/
[4]: https://gitlab.hrz.tu-chemnitz.de/s7398234--tu-dresden.de/base_modules/-/blob/main/raster.py?ref_type=heads#L78
[5]: https://www.nul-online.de/article-7301410-1111/landschaft-und-natur-in-karten-.html
[6]: https://doi.org/10.71830/VDMUWW

#FOSS breaks down barriers and makes innovation more accessible to everyone, worldwide. Roberto Luna Rojas from #Valkey shares why #opensource matters to him.

Learn more about #vectors, #hyperloglog, #Redis, and how to improve your observability with key-value datastores: https://t.ly/ZnTNX

#Linux #observability #kubernetes #softwarelibre #freesoftware

Counting Millions of Things with Kilobytes
A Hands-On Quarkus Tutorial for Scalable Unique Counting with HyperLogLog
https://myfear.substack.com/p/quarkus-hyperloglog-unique-counting-java
#Java #Quarkus #GitHub #HyperLogLog
Ever wonder what a HyperLogLog data structure is? (Who hasn’t!?) In our latest video, learn how Dragonfly implements this memory-efficient counter to track millions of unique users with just 1.5KB of memory. https://youtu.be/EIJbC9lxzts #DragonflyDB #HyperLogLog
https://youtu.be/EIJbC9lxzts
Scalable Real-Time Statistics with Dragonfly

YouTube

@bkastl Hm, feel you!

Arbeite durchaus in dem Bereich und war bisher immer ein großer Freund des Ethikrates.

Evtl. sollte sie eine Befürworterin des #HyperLogLog werden.

https://media.ccc.de/v/38c3-privacy-preserving-health-data-processing-is-possible

#38c3 #Patientenakte

Privacy-preserving (health) data processing is possible!

media.ccc.de

Completed the First Assignment of #645 @CMUDB , Hyperloglog was an interesting data structure to learn about.

#hyperloglog #presto

Google 的 HyperLogLog++

算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」,在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」,裡面提出了 HyperLogLog++ (HLL++)。

論文中 Google 提出來的改進主要有三個,第一個是用了 64-bit hash function:

5.1 Using a 64 Bit Hash Fu

https://blog.gslin.org/archives/2024/03/21/11709/google-%e7%9a%84-hyperloglog/

#Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

Google 的 HyperLogLog++

算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」,在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」,裡面提出了 HyperLogLog++ (HLL++)。 論文中 Google...

Gea-Suan Lin's BLOG

Redis 對 HyperLogLog 省空間的實作

HyperLogLog (HLL) 是用統計方式解決 Count-distinct problem 的資料結構以及演算法,不要求完全正確,而是大概的數量。

演算法其實沒有很難懂,在 2007 年的原始論文「HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm」裡面可以讀到演算法是長這樣:

可以

https://blog.gslin.org/archives/2024/03/20/11705/redis-%e5%b0%8d-hyperloglog-%e7%9c%81%e7%a9%ba%e9%96%93%e7%9a%84%e5%af%a6%e4%bd%9c/

#Computer #Murmuring #Software #algorithm #count #data #distinct #hyperloglog #problem #redis #structure

Redis 對 HyperLogLog 省空間的實作

Gea-Suan Lin's BLOG
More privacy for the EHDS

media.ccc.de

#HyperLogLog is super clever.

It can count any number of unique values in constant space (i.e. without storing the values) within a specified margin of error.

And HLLs can be merged to count unique number of values in both sets! So you can quickly count something like "unique number of requests per day", and combine these into "per month", and "per year", without storing a year worth of history.