Chapter 5 of my book, Test-Driven Data Analysis, is now freely available online at:

https://book.tdda.info/book/chapter5.html.

The chapter is called Constraint Discovery and Validation, and is concerned with automatic generation of constraints from believed-to-be-good data and the use of of those constraints for validation of new data.

The Python open-source tdda library and command-line tools makes this functionality available for data in Parquet files, CSV files and databases though language-neutral command-line tools, 'tdda discover' for generating constraints and the 'tdda verify' and 'tdda detect' commands for validating data. There is also a Python API for the same purpose.

The print edition version of the book remains available from all good booksellers and all sellers of good books, and the publisher has a 20% discount available until 30 June at https://www.routledge.com/Test-Driven-Data-Analysis/Radcliffe/p/book/9781032897158 with code 26SMA1.

#book #tdda #data #testing #datavalidation #datascience #ML #AI #books

Nick Radcliffe - Test-Driven Data Analysis | Pydata London 26

YouTube

The serialization of my book, Test-Driven Data Analysis, has reached chapter 4, which is available online at

https://book.tdda.info/book/chapter4.html

There is nothing more important than looking at data, and this chapter discusses how to construct a good combined profile and audit for each field in a dataset. Profiling shows the shape of the data, and auditing shows gaps, outliers etc.

Profiles for every field in a few datasets are also available in various forms at

https://book.tdda.info/profiles/

#tdda #book #books #testing #data #dataanalysis #visualization #python #rlang

Chapter 3 of my book, Test-Driven Data Analysis, is now available, free, to read online at https://book.tdda.info/book/chapter3.html

It's about textual data, unicode, encodings, normalization, comparison, emoji and so forth.

Do check it out if you work with text.

I'm opening up a chapter each week. And of course, the book is available from all good book sellers, and all sellers of good books. You can get 20% off with code 26SMA1 at the publisher’s site, https://www.routledge.com/Test-Driven-Data-Analysis/Radcliffe/p/book/9781032897158.

Amazon says it ships by Father’s day…and presumably by the next Mother's Day too. In case you need something wholesome for the parent who has everything except a good approach to validating data analyses!

#tdda #book #books #data #testing

Chapter 2 of the TDDA book is now live on the open web at

https://book.tdda.info/book/chapter2.html

It makes the case for data validation and surveys different kinds of data quality issues, introducing x-nulls and μ-nulls.

#TDDA #book #data #testing #reproducibility #dataanalysis #ML #AI

TDDA Book Online Serialization

As announced a few days ago, my book, Test-Driven Data Analysis, is now available for sale from all good booksellers and all sellers of good books, around the world. The book is aimed at analysts, data scientists, engineers, researchers and anyone else interested in making analytical processes more reliable, testable …

Test-Driven Data Analysis

The TDDA book, https://book.tdda.info, is being serialised, online, for free.

You can read chapter 1 right now, and get access to the glossary, bibliography, figures, profiles, example data, checklists and more at https://book.tdda.info. Chapter 2 will be available next week and all chapters will be available by September this year.

You can get notified as new chapters are released by signing up at https://book.tdda.info/notify.

You can, of course, get print and ebook copies from all good booksellers and all sellers of good books, and the code 26SMA1 will get you 20% off at the publisher's site, https://www.routledge.com/Test-Driven-Data-Analysis/Radcliffe/p/book/9781032897158.

#TDDA #book #books #python #data #dataanalysis #ML #AI #TDD #Rlang #datascience

CSV Metadata and tdda.serial: A Guide for LLMs and Coding Agents

https://www.tdda.info/csv-metadata-and-tddaserial-a-guide-for-llms-and-coding-agents

On tdda.serial, csvw, and Frictionless

#tdda #tddaserial #csv #metadata #data #csvw #frictionless

CSV Metadata and tdda.serial: A Guide for LLMs and Coding Agents

Raw markdown for direct use in LLM context windows. This post is addressed primarily to LLMs and coding agents. It covers the tdda.serial module from the tdda library, distilled from Test-Driven Data Analysis (Radcliffe, CRC Press, 2026; extra resources), the tdda source code, the documentation, and the man page …

Test-Driven Data Analysis

Version 3.0 of the tdda library and command-line tools has shipped:

python -m pip install -U tdda

or the usual variations.

Source: https://github.com/tdda/tdda.git
Docs: https://tdda.readthedocs.io/en/latest/
Book: https://book.tdda.info
Book 20% discount code: 26SMA1 at https://www.routledge.com/Test-Driven-Data-Analysis/Radcliffe/p/book/9781032897158

- Command-line tools for data validation (including constraint inference from training data)
- Reference testing (semantic testing of complex results)
- Automatic test-generation (any language)
- Format and utilities for working more safely with flat files (e.g. CSV files) with tdda.serial metadata and/or CSVW, Frictionless. Conversion utilities and format inference available.
- Utilities for unicode text (glyph counting and TK normal form, which goes beyond NFKC and NFKD)

3.0 includes
- Support for Pandas 3.0 (original, numpy_nullable, and pyarrow backends)
- Support for Polars in most areas
- Comprehensive parquet suport (replacing feather)
- Man pages for all commands
- Upgraded help docs for the whole library
- Associated book with the methodology
- 22 checklists for methodological support in areas unsuitable for code support.

#tdda #python #data #dataanalysis #ml #AI #rlang #datascience #testing #book #books #reproducibility #reproducibleresearch

All the top cats are reading the same book today.

Be like Mozzie and Alfie and get hold of a copy of the TDDA book on improving data analysis and data quality.

Use the discount code 26ESA2 at the publishers website for a 20% discount on anything format:

https://www.routledge.com/Test-Driven-Data-Analysis/Radcliffe/p/book/9781032896700

Available from all good booksellers and all sellers of good books, shipping Tuesday 19th June.

The matching 3.0 release of the tdda command-line tool and Python Library will ship on Monday with myriad new features.

I'm assured that it makes a great present for the geek in your life and for all the best pets.

#book #TDDA #data #dataanalysis #analysis #ML #AI #python #Rlang #books