RE: https://floss.social/@janriemer/114760556247092176

New version of #CSVDiff is out!  

https://crates.io/crates/csv-diff

Thanks to it now using `extract_if` instead of a hacky drain-then-filter impl, you can now e.g. diff your x-mas wishlist against your "actual-gifts-received list" _25% faster_ (and be happy, if no diff is reported)! 🎅 🚀

⚠️ The new version has an MSRV of 1.88 (in order to use `extract_if`)!

See the complete changelog for details:
https://gitlab.com/janriemer/csv-diff/-/blob/main/CHANGELOG.md#012-23-december-2025

Happy X-Mas y'all! 🎄 🎁

#Rust #RustLang #CSV #Crate #Release

I can't wait to use `extract_if` in #CSVDiff 

csv-diff makes use of a "manual" (aka hacky) implementation of it using `drain` (to remove equal csv records) in combination with an "intermediate" HashMap to restore the not-to-be-removed csv records (the ones that are different):

https://gitlab.com/janriemer/csv-diff/-/blob/main/src/diff_result.rs?ref_type=heads#L599

We can _probably_ remove this hacky implementation and replace it with `extract_if`! Very exciting!

2/2

#CSVDiff

src/diff_result.rs · main · Jan Riemer / csv-diff · GitLab

Compare two CSVs - with ludicrous speed. The fastest CSV-diffing library in the world 🚀. Written in Rust 🦀.

GitLab

#Fuzzing along in #CSVDiff  

In the second screenshot I've highlighted some interesting parts:

Key field indices are 2 and 3, so when diffing the records, where key fields are highlighted, they'll be compared as `Modify`, because:
- key fields are equal between left and right record
- other fields are unequal between left and right record

The other two records on the right have no corresponding left record - so those are `Add`ed records

#Rust #FuzzTesting #RustLang #PropertyTesting

Huh, seems like I really have been living on the bleeding edge (of #FormalVerification):

https://github.com/creusot-rs/creusot/discussions/1477#discussioncomment-12991148

The verification in the prev toot is currently not possible in #Creusot due to missing specs for the `Hash` trait and HashMap more broadly. 😔

Oh well, seems like (at least currently!) I won't be able to fully verify the diffing algorithm of #CSVDiff.🥺

Options I have now are:
- Only verify parts of the algorithm (that don't depend on HashMap ops)
or
- Use fuzzing/property testing

Just published a new version of csv-diff (v0.1.1) 🚀

https://lib.rs/crates/csv-diff

This fixes a nasty bug regarding sort order of modified csv records. 😖

Details in the MR/PR:
https://gitlab.com/janriemer/csv-diff/-/merge_requests/31

Also, two new incoming PRs for #qsv, the #CSV toolkit:

The first updates to the latest csv-diff, fixing aforementioned bug:
https://github.com/dathere/qsv/pull/2456

The second fixes a bug regarding conversion from column names to indices:
https://github.com/dathere/qsv/pull/2457

#Rust #RustLang #OpenSource #CSVDiff

csv-diff

Compare two CSVs - with ludicrous speed 🚀

Lib.rs

Ouch, there is another bug and this time it is actually _in #CSVDiff itself_!

It happens with sorting the results of modified rows (urgh, I'm also not happy with the sorting code).😨

Thankfully, datatraveller1 already has found a reproducible example - thank you so much! ❤️

Bug:
https://github.com/dathere/qsv/issues/2443#issuecomment-2598987465

I think I already found a solution, but needs rigorous testing first!

Potential solution:
https://github.com/dathere/qsv/issues/2443#issuecomment-2599681431

#qsv #Bug #csv

BUG qsv diff produces different results for the same command · Issue #2443 · dathere/qsv

This is an interesting issue. Have you noticed that successive invocations of the same command with qsv diff give different results? The results are usually correct, but sometimes wrong. qsv diff -...

GitHub

Nice, I think I found the bug! 🐛

See all the explanation and possible solution here:

=> https://github.com/dathere/qsv/issues/2443#issuecomment-2597097311

Workaround is also present and explained, so should be no blocker for people.

Will prob provide a fix on the weekend. 🤞

#CSVDiff #qsv #Bug #Fix #Bugfix

Uh ohhhh, someone reported a bug in qsv's `diff` command.😮 🙈

https://github.com/dathere/qsv/issues/2443

Hopefully, we can resolve this soon! 🤞🥺

I have a strong suspicion, but let's see... I need more info first from the OP.

#Bug #Issue #CSVDiff #Diff #CLI #qsv

BUG qsv diff produces different results for the same command · Issue #2443 · dathere/qsv

This is an interesting issue. Have you noticed that successive invocations of the same command with qsv diff give different results? The results are usually correct, but sometimes wrong. qsv diff -...

GitHub

@shuttle I consequently use #TDD, where possible.

Yes, sure, #Rust prevents a lot of bugs at compile time already, but not logic bugs.

For example in #CSVDiff we have ~70 unit tests and ~12 integration tests. The only "bug report" we have ever gotten was due to a corrupted CSV file (being mistaken with a bug in diff):

See here (qsv):
https://github.com/jqnatividad/qsv/issues/1258#issuecomment-1712924932

csv-diff:
https://gitlab.com/janriemer/csv-diff

In the future I'd like to add property and mutation testing as well 🤓

#RustLang #Testing #UnitTest

BUG: qsv diff fails for different delimiters for the left and right CSV files · Issue #1258 · jqnatividad/qsv

Describe the bug qsv diff fails for different delimiters for the left and right CSV files when trying to perform a diff on CSV files with different delimiters. To Reproduce Steps to reproduce the b...

GitHub

#CsvDiff has finally reached v0.1.0, it's first ever non-alpha/-beta release! 🎉

New features like getting at the headers from the diffresult have been needed for the following PR in qsv (which is in final review):
https://github.com/jqnatividad/qsv/pull/1395

When merged, you'll be able to decide, whether the diffresult should output headers or not (see examples in the PR).  

Check out csv-diff's Changelog for the full details:
https://gitlab.com/janriemer/csv-diff/-/blob/main/CHANGELOG.md?ref_type=heads#010-30-october-2023

#CSV #qsv #CLI #DataScience #DataEngineering

`diff`: add option/flag for headers in output by janriemer · Pull Request #1395 · jqnatividad/qsv

This implements a new option for diff: It is now possible to decide, whether the CSV headers in the compared CSVs should be in the diff output. If neither CSVs have headers and this option is not a...

GitHub