Mastodawn

Tabular data can benefit from merging external sources of information.

The FeatureAugmenter is a sklearn transformer to augment a given dataframe by joins on reference tables.
https://dirty-cat.github.io/stable/generated/dirty_cat.FeatureAugmenter.html

fuzzy_join makes it robust to mismatch in vocabulary. Hyperparameter optimization can tune matches for prediction

For such external information,
diry-cat can download embeddings of wikipedia data on millions of entities: companies, cities, geographic locations...
https://dirty-cat.github.io/stable/auto_examples/07_ken_embeddings_example.html

dirty_cat.FeatureAugmenter

Usage examples at the bottom of this page. Examples using dirty_cat.FeatureAugmenter: Fuzzy joining dirty tables and the FeatureAugmenter Fuzzy joining dirty tables and the FeatureAugmenter Wikiped...

dirty_cat