🚨 New feature alert!
You can now download BOLD data packages in Parquet format, making it easier to work with large datasets in R and for better interoperability with downstream analytics.
Visit boldsystems.org/data/data-packages to get started.
#Parquet #Interoperability #BOLDSystems
https://bsky.app/profile/boldsystems.bsky.social/post/3mi4s7zvfhr2k

🚨 New feature alert! You can now download BOLD data packages in Parquet format, making it easier to work with large datasets in R and for better interoperability with downstream analytics. Visit boldsystems.org/data/data-packages to get started.

Parquet gave data lakes a common language: columnar layout, good compression, and fast scans. That still works well for classic analytics. But workloads have changed. We now mix wide scans with point lookups, handle embeddings and images, and run on S3-first stacks. On NVMe you want lots of tiny random reads. On S3 you want fewer, larger range requests. A format tuned for one world can feel chatty or slow in the other.
Гайд: Как работать с форматом PARQUET
В прошлом году мы начали публиковать данные в каталоге «Если быть точным» в формате Parquet . Его придумали инженеры Twitter и Cloudera в 2013 году, и сегодня он стал стандартом хранения аналитических данных — его используют Google, Amazon, Netflix и большинство современных data-платформ. В этом гайде мы расскажем, как эффективно работать с данными в формате Parquet с помощью Python.
Hacker News archive (47M+ items, 11.6GB) as Parquet, updated every 5m
https://huggingface.co/datasets/open-index/hacker-news
#HackerNews #HackerNews #Archive #Parquet #Data #47MItems #UpdatedEvery5m
scrapy-contrib-bigexporter 1.1.0 released. Scrape data using Scrapy in parquet,avro,orc or iceberg format. Changes: CI/CD pipeline on Codeberg Actions, Update Actions, Apply strict schema to Arrow table if schema is provided.
Munquet 0.2.1 just landed on Flathub 🚀
Fixed a small race condition when canceling a conversion — turns out the process could finish right before you clicked “Yes” 😅
Two lines later… all good.
https://flathub.org/en/apps/io.gitlab.zulfian1732.munquet
#Flatpak #GTK4 #OpenSource #Parquet #DataScience #Linux #Python #PyArrow