Ivan Begtin

51 Followers
61 Following
27 Posts
I am the founder of APICrafter startup and Infoculture NGO, and creator of open data and data engineering projects.
Websitehttps://begtin.tech
Longreadshttps://begtin.substack.com
Telegram channelhttps://t.me/begtin
Githubhttps://github.com/ivbeg

Several major updates to Common Data Index registry and right now it includes 9635 data catalogs, most of them are in US and it's geoportals/geocatalogs like ArcGIS Hub, ArcGIS server, #geonetwork and e.t.c. #CKAN is the most popular open data software.

My personal KPI about this registry is to reach at least 13000 data catalogs. It's the number of data sources of the Google Dataset search (GDS). Even if GDS consumes different data sources

Registry code and data at https://github.com/commondataio/dataportals-registry

GitHub - commondataio/dataportals-registry: Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard

Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard - GitHub - commondataio/dataportals-registry: Registry of data portals, catalo...

GitHub

Hi everyone!

So, as the first part of this project, we collect and build a Common Data Index that will be the foundation of the future dataset search engine. Common Data Index is similar to Common Crawl but focused data catalogs. It will be open sources, open data, and open licensed registry and index of all datasets in data catalogs.

Already open-sourced registry of data catalogs https://github.com/commondataio/dataportals-registry

Please let me know if you are interested in cooperation.

GitHub - commondataio/dataportals-registry: Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard

Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard - GitHub - commondataio/dataportals-registry: Registry of data portals, catalo...

GitHub

A nice visualization of the cost of renting a square meter of housing in France. The author, Boris Mericskay wrote in the comments [1] that it would be necessary to add a gradation from 20 to 30 euros for Paris, but in general it is very clear. And this is the visualization on open government data of the Ministry of Ecological Transition of the country (Ministère de la Transition écologique) [2].

References։
[1] https://twitter.com/BorisMericskay/status/1607437455656902657/photo/1
[2] https://www.data.gouv.fr/fr/datasets/carte-des-loyers-indicateurs-de-loyers-dannonce-par-commune-en-2022/

#opendata #france #datasets

Boris Mericskay on Twitter

“🏬La France des #loyers des appartements en une carte grâce à la collecte et le traitement de 7 millions d’annonces locatives par @Anil_Officiel ➡Jeux de données disponible en #opendata sur @datagouvfr (https://t.co/w7DxmgiWD8)”

Twitter

I forgot to write about that a couple of months I started and almost completed a small software library for Python for reading data from files in any data formats։ csv, json, json lines, xml, parquet, orc, xls, xlsx and others in the future. It is called pyiterable [1] and reproduces and enhances the code found in the undatum command line utility [2] and the datacrafter [3] ETL engine.

Links։
[1] https://github.com/apicrafter/pyiterable
[2] https://github.com/datacoon/undatum
[3] https://github.com/apicrafter/datacrafter

#datatools #data

GitHub - apicrafter/pyiterable: Python library to read, write and convert data files with formats BSON, JSON, NDJSON, Parquet, ORC, XLS, XLSX and XML

Python library to read, write and convert data files with formats BSON, JSON, NDJSON, Parquet, ORC, XLS, XLSX and XML - GitHub - apicrafter/pyiterable: Python library to read, write and convert dat...

GitHub