Munquet 0.2.1 just landed on Flathub πŸš€

Fixed a small race condition when canceling a conversion β€” turns out the process could finish right before you clicked β€œYes” πŸ˜…

Two lines later… all good.

https://flathub.org/en/apps/io.gitlab.zulfian1732.munquet

#Flatpak #GTK4 #OpenSource #Parquet #DataScience #Linux #Python #PyArrow

Install Munquet on Linux | Flathub

Convert to Parquet

Munquet is now officially on Flathub πŸŽ‰

A native Linux app to convert datasets into Apache Parquet using PyArrow backend. Perfect for data science workflows, analytics, and anyone needing fast local conversions.

Get it here: https://flathub.org/en/apps/io.gitlab.zulfian1732.munquet

@gnome @xfce @kde @GTK @linux @flathub

#apache #pyarrow #datascience #parquet #csv #OpenSource #Python #GNOME #GTK4 #Adwaita

Install Munquet on Linux | Flathub

Convert to Parquet

πŸš€ Munquet β€” Convert, merge, rename & validate tabular data into Parquet, fully offline & batch-ready.

GitLab: https://gitlab.com/zulfian1732/munquet

Featured in: @severo 's Awesome Parquet: https://github.com/severo/awesome-parquet πŸ™

#Parquet #OpenSource #Python #GNOME #GTK4 #Adwaita #PyArrow

Zulfian / munquet Β· GitLab

Lightweight desktop tool for converting and validating tabular datasets into efficient Parquet format.

GitLab

πŸš€ Sneak peek Munquet!
Convert, merge, rename, and validate tabular data safely into Parquet. Works offline, with batch processing and progress feedback.

GitLab repo:

https://gitlab.com/zulfian1732/munquet

Flathub release coming soon!

#Python #GTK4 #GNOME #PyArrow #Parquet #DataScience #Libadwaita

Released scrapy-contrib-bigexporter 1.0.0 (https://codeberg.org/ZuInnoTe/scrapy-contrib-bigexporters) - additional export formats for the webscraping framework Scrapy.

Migrated parquet export from fastparquet to pyarrow as fastparquet is deprecated (https://docs.dask.org/en/stable/changelog.html#fastparquet-engine-deprecated)

Migrated orc export from pyorc to pyarrow to reduce the number of dependencies

#scrapy #crawling #python #parquet #orc #pyarrow #webcrawling #scraping

scrapy-contrib-bigexporters

Scrapy exporter for Big Data formats

Codeberg.org
If the purpose of a library is to "process and transport large data sets" but the code base contains an error message like "array cannot contain more than 2147483646 bytes" then there must be a big misunderstanding somewhere. #pyarrow
Easily obtain OSM and OMF data: #Python and CLI tools #QuackOSM and #OvertureMaestro offer easier access to data from #OpenStreetMap (#OSM) and the Overture Maps Foundation (#OMF) through #PyArrow, #GeoParquet, or #DuckDB. These tools can simplify large-scale geospatial data...
https://spatialists.ch/posts/2025/05/23-easily-obtain-osm-and-omf-data/ #GIS #GISchat #geospatial #SwissGIS
Easily obtain OSM and OMF data – Spatialists – geospatial news

#Python and CLI tools #QuackOSM and #OvertureMaestro offer easier access to data from #OpenStreetMap (#OSM) and the Overture Maps Foundation (#OMF) through #PyArrow, #GeoParquet, or #DuckDB. These tools can simplify large-scale geospatial data tasks for seamless data engineering and analysis.

Spatialists – geospatial news
Easily obtain OSM and OMF data: #Python and CLI tools #QuackOSM and #OvertureMaestro offer easier access to data from #OpenStreetMap (#OSM) and the Overture Maps Foundation (#OMF) through #PyArrow, #GeoParquet, or #DuckDB. These tools can simplify large-scale geospatial data...
https://spatialists.ch/posts/2025/05-23-easily-obtain-osm-and-omf-data/ #GIS #GISchat #geospatial #SwissGIS
Easily obtain OSM and OMF data – Spatialists – geospatial news

#Python and CLI tools #QuackOSM and #OvertureMaestro offer easier access to data from #OpenStreetMap (#OSM) and the Overture Maps Foundation (#OMF) through #PyArrow, #GeoParquet, or #DuckDB. These tools can simplify large-scale geospatial data tasks for seamless data engineering and analysis.

Spatialists – geospatial news
The PyArrow Revolution

Pandas is at a the core of virtually all data science done in Python, that is virtually all data science. Since it's beginning, Pandas has been based upon numpy. But changes are afoot to update those internals and you can now optionally use PyArrow. PyArrow comes with a ton of benefits including it's columnar format which makes answering analytical questions faster, support for a range of high performance file formats, inter-machine data streaming, faster file IO and more. Reuven Lerner is here to give us the low-down on the PyArrow revolution.

Currently taking a look at refreshing some of the #ApacheArrow and #PyArrow docs, so if you use Arrow in #rstats or Python and there's any areas you'd like to understand better, give me a shout, and we'll see what we can do!