Chapter 5 of my book, Test-Driven Data Analysis, is now freely available online at:

https://book.tdda.info/book/chapter5.html.

The chapter is called Constraint Discovery and Validation, and is concerned with automatic generation of constraints from believed-to-be-good data and the use of of those constraints for validation of new data.

The Python open-source tdda library and command-line tools makes this functionality available for data in Parquet files, CSV files and databases though language-neutral command-line tools, 'tdda discover' for generating constraints and the 'tdda verify' and 'tdda detect' commands for validating data. There is also a Python API for the same purpose.

The print edition version of the book remains available from all good booksellers and all sellers of good books, and the publisher has a 20% discount available until 30 June at https://www.routledge.com/Test-Driven-Data-Analysis/Radcliffe/p/book/9781032897158 with code 26SMA1.

#book #tdda #data #testing #datavalidation #datascience #ML #AI #books

Nové edukační video: naučte se používat ověření dat s tvorbou rozevíracího seznamu a podmíněné formátování na praktickém projektu interaktivního poznámkového bloku. 🎥

https://www.youtube.com/watch?v=J4-3fe8vbY4

#Excel #DataValidation #PodmineneFormatovani #Tutorial #Vzdelavani

MS Excel - Jak si vytvořit interaktivní poznámkový blok #ověřenídat #podmíněnéformátování

YouTube
Data Engineer - Remote

Automate data workflows; Build data pipelines; Collaborate with ai researchers; Collaborate with data scientists; Design data pipelines; Design data schemas; Develop data models; Develop storage systems; Ensure Data reliability; Ensure data integrity; Ensure data quality; Explore datasets; Extract, transform, and analyze data; Implement data monitoring; Implement data validation; Ingest data from multiple sources; Maintain data pipelines; Prepare datasets for experimentation; Prepare datasets for model training; Process data; Transform data into structured formats; Write Python scripts; Write SQL queries;

Data Engineer

Justin Castilla has a Spotlight Session at Nebraska.Code() this July.

Find 'Tracking Longterm Health with a Sympathetic Voice - empowering Agentic AI to actually listen' here:

https://nebraskacode.amegala.com/

#AgenticAI #ModelContextProtocol #MCP #LLMs #SemanticSearch #ImplementationPatterns #DataValidation #Workflows #Python #API #TechConference #Spotlight #NebraskaTech #developercommunity

Data scrubbing helps improve data quality by identifying and fixing errors, duplicates, and outdated information. It ensures accurate, consistent datasets for better analytics and decision-making.

With proper validation, standardization, and regular updates, businesses can maintain clean data, improve insights, and streamline operations.

Read more: https://www.habiledata.com/blog/data-scrubbing-guide/

#DataScrubbing #DataCleansing #DataQuality #DataValidation

Reliable business insights via b2b data aggregation

Accurate business data supports analytics, outreach, and planning. Data aggregation for b2b companies centralizes information from diverse sources and validates accuracy through structured checks. Continuous enrichment ensures datasets remain reliable and up to date.

Know more: https://www.hitechdigital.com/b2b-data-aggregation

#B2BDataAggregation #DataAggregationServices #DataEnrichment #DataCleansing #DataValidation #B2BDataSolutions #DataQualityManagement

marmelab web developer Thiery Michel shares in this article his use of PostgreSQL features that allow us to draw certain data validation logic away from the application layer, and into the database. The proposed solutions in some instances can be more elegant than a purely application layer data validation approach.

"9 Advanced PostgreSQL Features I Wish I Knew Sooner"

https://marmelab.com/blog/2026/02/23/do-you-know-psql.html

#programming #sql #postgresql #database #datavalidation

Learn how developers use data validation APIs to verify emails, addresses, phone numbers, and identities to improve data quality, security, and app performance. https://hackernoon.com/apis-for-data-validation-a-developers-practical-guide #datavalidation
APIs for Data Validation: A Developer’s Practical Guide | HackerNoon

Learn how developers use data validation APIs to verify emails, addresses, phone numbers, and identities to improve data quality, security, and app performance.

Code changes have AI review tools. Data changes don't... until now.

Our own Kent Chen wrote about how the team built a multi-agent system with Claude Agent SDK and MCP that reviews data changes in every dbt PR. Orchestrator + two specialist agents using 6 Recce MCP tools for lineage diffs, schema diffs, row counts, and custom queries.

AI Data Review lands in the PR automatically. No manual queries.

https://blog.reccehq.com/designing-reliable-ai-agents-for-dbt-data-reviews

#dbt #DataEngineering #DataValidation #AI #MCP

Designing Reliable AI Agents for dbt Data Reviews

Code changes have AI review tools. Data changes don' - until now. Here's how we went from a single prompt to an AI agent that performs the first pass on data validation in every PR.