#jobim2024 Idriss André, on graph representation learning and semantic distribution, with application to omics expression data.

Multi-omics integration of sample-related data, with "fusion" [better choice than "integration", indee., IMHO] algorithms. Introduces Knowledge Graphs. Nodes and edges have labels [important nuance].

Graph Embeddings: graph → vectors. Several approaches possible, uses WalkLM. Basically random walks the graph to build up a text sequence from labels. Then input to a LLM

Training from paired of random walks "sentences" and score with similarity, then embedd and score with initial similarity.

A sample of walks forms a sample of vector, hence a distribution in the embedding space. The distribution represents the entity of interest. Such that biological samples are seen as a parameter (e.g. the mean, weighted by the initial quantitative measurements) of the distribution issued from the object of interest.

Then, predicts cancer type on TCGA. Compares efficiency using transformed and raw vectors. The transformed vectors performed better, but not as well as SoA so far.

EOT