Habe etwas im Datenportal der EU nachgesehen. Dabei habe ich zufällig einen Blick auf die Metadatenqualität eines unserer Datensätze geworfen. Excellent 😀

Nun muss ich nur noch herausfinden, warum die Kontaktinformation nicht richtig ankommen.

https://data.europa.eu/data/datasets/c0b506d1-57ba-4088-a257-0d8244256248/quality

#OpenData #DataQuality #metadata

All data is wrong, but some data is wrong on multiple levels. That's why the Data Validation Report Format (DVRF) support precise error locations and nested errors. Version 1.0.0 of the specification has just been published at https://doi.org/10.5281/zenodo.20792191 and https://gbv.github.io/data-validation-report-format/ #dataquality
Data Validation Report Format

This document specifies a data format to report validation errors of digital objects with error positions independent from specific document models

Zenodo

Bad time entries look harmless until you total them up. Ours added up to roughly $1M in losses in a single year.

So we built Pecas: hard rules catch the obvious mistakes, a binary classifier handles the messy free-text ones, and humans only review what gets flagged.

Here is the business case behind it: https://go.upgradejs.com/r7e

#MachineLearning #TimeTracking #DataQuality

🤖 Gli agenti AI non falliscono per budget limitati, ma per dati incoerenti: senza qualità, anche l’automazione più avanzata perde valore. #AI #DataQuality

🔗 https://www.tomshw.it/business/agenti-ai-dati-infrastruttura-confluent-2026

Gli agenti AI non falliscono per mancanza di budget: falliscono per dati sbagliati

Un report Confluent su 4.625 IT leader rivela che il vero freno all'AI in produzione è l'infrastruttura dati: frammentata, lenta, senza lineage chiaro.

Tom's Hardware

Your £2M Data Problem Becomes a £20M AI Risk by 2030

Subject: The £2M problem that becomes £20M in 2030 Hi Why AI amplification will separate survivors from casualties. By 2030, AI, quantum computing, and IoT will converge into an integrated technological ecosystem. If your data foundation is broken today, the convergence will not save you—it will destroy you. I've spent 20+ years watching organizations invest billions in transformation while ignoring the one thing that determines success: their data foundation. I've seen the £15M transformation disasters. The £7M costs of fear-driven silence. The £2M annual bleeds from "just how things work." But 2030 changes everything. Because AI doesn't just use your data. It amplifies it. Let me show you what this looks like in practice. Most organisations already live with data issues. Inconsistent definitions. Missing fields. Duplicates. Stale records. Quiet reshaping of data as it moves between systems. Today, many of these problems are contained because humans sit in the loop. An analyst questions a number. A manager challenges a report. Someone spots something that doesn't feel right before it triggers major action. As we move toward 2030, that changes. AI-enabled workflows increasingly: make decisions automatically, trigger downstream actions automatically—orders, pricing, eligibility, routing, fraud controls, operate continuously rather than weekly or monthly, rely on multiple systems and external data feeds. In plain terms: the same error creates more consequences before anyone notices. Read more in this blog and my book https://lizhendersondata.wordpress.com/your-unseen/ Best wishes Liz Henderson - Data Queen https://lizhendersondata.wordpress.com/your-unseen/

https://lizhendersondata.wordpress.com/2026/06/22/your-2m-data-problem/

Data quality is a pipeline problem, not a form fix. Learn how developers can enforce quality through profiling, matching, and workflow automation at scale. https://hackernoon.com/building-data-quality-into-the-pipeline-instead-of-cleaning-up-after-it #dataquality
Building Data Quality Into the Pipeline Instead of Cleaning Up After It | HackerNoon

Data quality is a pipeline problem, not a form fix. Learn how developers can enforce quality through profiling, matching, and workflow automation at scale.

🤖 Can #AI predict the quality of a survey question?

Our new study shows that a fine-tuned multilingual transformer can match the performance of SQP's traditional prediction approach—directly from question text.

📄 Yang, Schonlau, Repke, Felderer, & Sucholutsky (2026)
https://doi.org/10.1093/jrsssa/qnag058

*SQP = The Survey Quality Predictor is a web-based tool that predicts the measurement quality of survey questions (https://sqp.gesis.org).

@GESIS
#SQP #DataQuality #SurveyMethodology

Ever tried restructuring your data quality rule flow and got blocked by validation errors at every step?

The Advanced Flow Editor in HEDDA.IO changes that. Rearrange your Business Rule connections freely, collect your changes, and save when you are ready - no more interruptions mid-edit.

Read the full article here 👇
https://hedda.io/advanced-flow-editor-in-heddaio/

#DataQuality #Datagovernance #OpenData

New blog post: 'Comically Bad' Data for Diabetes Models? You're Having a Laugh, Right?

Saw a headline about "comically bad" datasets used for clinical models in diabetes, and honestly, it's beyond a joke. When it comes to health tech, shoddy data isn't just an error; it's a bloody risk.

https://rhodzy.com/blog/comically-bad-data-for-diabetes-models-youre-having-a-laugh-right

#diabetes #tech #ai #dataquality #health #machinelearning

rhodzy.com

Building Data Catalogs: The Quiet Power Behind a Single Source of Truth.

A bold look at why strong data catalogs help firms build a clear single source of truth, cut noise, spark trust, and reshape how teams think and act.