✨Table representation learning preprint: automatic generation of features (relational data) that capture well the information distributed across many tables

Vector embeddings of entities in databases that expose information to learning models

https://soda-inria.github.io/ken_embeddings/

1/6

Relational Data Embeddings

KEN: Relational Data Embeddings for Feature Enrichment with Background Information

Relational Data Embeddings

We tackle data preparation: merging multiple tables to enrich data at hand, for instance adding information about the locality in a housing-price study.

This is a time-consuming feature-generation step in large databases.

We represent the data across many tables as a graph (like knowledge graphs) and adapt knowledge-graph embedding methods.

Briefly: column pairs define edge types and the discrete entries (entities) define the nodes. Such knowledge representation is very broad, as with RDF
2/6

@ogrisel yes, thanks! I messed up my threading. I still struggling with the mastodon UI.

@GaelVaroquaux This looks very interesting.

However, I don't see the license for the datasets. I could download them without seeing any license.

@jfpuget Good point!

I've fixed this: added an explicit license (cc-by-4.0, following the data source from which we derive)