Soda - Inria

110 Followers
11 Following
43 Posts
We are an INRIA research team working on the intersection of machine learning, health, and society.
Websitehttps://team.inria.fr/soda/
Githubhttps://github.com/soda-inria
Twitterhttps://twitter.com/soda_INRIA

The team's annual report is out!
It's our first year, we are still ramping up, but our efforts project our vision:
https://radar.inria.fr/report/2022/soda/index.html

Next year will be even more exciting, as we have many ongoing research, in statistical learning, data management, health or education.

SODA - 2022 - Annual activity report

dirty_cat 0.4.0 beta version is out!

Some brand new features in this release:
- the fuzzy_join function to join tables on imprecise matches;
- the FeatureAugmenter, a scikit-learn Transformer for joining multiple tables with dirty categories;
- the deduplicate function can regroup dirty categories in your table.

You can install it:

pip install dirty-cat==0.4.0b1

Try it and give us feedback!

See all changes: https://github.com/dirty-cat/dirty_cat/releases/tag/0.4.0b1

And check-out the examples on:
https://dirty-cat.github.io/dev/

Release Release 0.4.0b1 · dirty-cat/dirty_cat

What's Changed Major changes New experimental feature: joining tables using fuzzy_join by approximate key matching. Matches are based on string similarities and the nearest neighbors matches are f...

GitHub