Ivan Begtin

51 Followers
61 Following
27 Posts
I am the founder of APICrafter startup and Infoculture NGO, and creator of open data and data engineering projects.
Websitehttps://begtin.tech
Longreadshttps://begtin.substack.com
Telegram channelhttps://t.me/begtin
Githubhttps://github.com/ivbeg

Several major updates to Common Data Index registry and right now it includes 9635 data catalogs, most of them are in US and it's geoportals/geocatalogs like ArcGIS Hub, ArcGIS server, #geonetwork and e.t.c. #CKAN is the most popular open data software.

My personal KPI about this registry is to reach at least 13000 data catalogs. It's the number of data sources of the Google Dataset search (GDS). Even if GDS consumes different data sources

Registry code and data at https://github.com/commondataio/dataportals-registry

GitHub - commondataio/dataportals-registry: Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard

Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard - GitHub - commondataio/dataportals-registry: Registry of data portals, catalo...

GitHub
The number of Chinese open data portals is snowballing. Especially open research data and projects like Scidb.cn and findata.cn, but provincial and city open data portals too.
I've just published an Awesome list of open data software https://github.com/commondataio/awesome-opendata-software lists most open source and commercial software and SaaS services to publish open data. #opendata #datacatalogs
GitHub - commondataio/awesome-opendata-software: Awesome list of the software tools related to opendata: data catalogs, ingestion tools, data prep tools and so on

Awesome list of the software tools related to opendata: data catalogs, ingestion tools, data prep tools and so on - GitHub - commondataio/awesome-opendata-software: Awesome list of the software too...

GitHub
Does anyone know about the Chinese search engine findata.cn? According to the description, they focus on open data in science, but I could not find any roadmap or architecture.

Hiring a software engineer in their late 40s:

Pros:
* Understands your stack better than you do after glancing through the repo for five minutes.
* Will rewrite said stack 2x as fast, and half as buggy if you let them.

Cons:
* Gives zero fucks.
* Knows we're not *really* like family here.
* No, seriously, absolutely zero fucks given.

Do not cite the deep magic to me, product manager, I was there when it was written.

Maybe someone could help me, I am looking for a list or database of the data/content licenses worldwide. Not just open licenses, but not open/free licenses too. I have a lot of license titles/urls/names/ids collected from about 1000+ CKAN, DKAN, Dataverse, Invenio, and other public data catalog types. Most of these licenses are CC-derived but not all of them. And I can't find anywhere a reference list of the licenses to map them to.
I am happy to announce Open Data Armenia community and data catalog https://opendata.am/2023/05/24/open-data-armenia-project-announcement/
It's dedicated to Armenia, Armenian history, culture and language. We will prepare and publish Armenia related datasets regularly #opendata #armenia #opengov
Open Data Armenia Project Announcement – Open Data Armenia

I've finished the initial part of the Common Data Index project as the registry of data catalogs. It includes 2000+ data catalogs for now https://github.com/commondataio/dataportals-registry. The next step is to collect and index available metadata. #opendata #datasets #commondataindex
GitHub - commondataio/dataportals-registry: Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard

Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard - GitHub - commondataio/dataportals-registry: Registry of data portals, catalo...

GitHub

The list of most public data portals software: CKAN, ArcGIS Hub, Geonetwork, Dataverse, OpenDataSoft, Socrata, NADA, and InvenioRDM. Geonetwork is usually popular but it's instances ain't simple to find, there is no single list of them, except the list that I do.

#opendata #data #datasets

I've been manually analysing about 1.5k data portals and catalogues and I've found a few insights I'd like to share.

First of all, I see that up to 75% of all open government data is currently geospatial data. I see an increasing number of Geonetwork and ArcGIS Hub installations.

Second, I see a wave of open research data repositories with Dataverse, InvenioRDM and a lot of custom software.

I don't know yet what to do with all this knowledge. But maybe you have some ideas?
#opendata