🌟 Sinkhorn-Knopp Algorithm: Giải thuật này như Softmax nhưng专注 về transport tối ưu trong toán học. Tài liệu liệt kê #SinkhornKnoppAlgorithm #OptimalTransport #AI #ToánHọc #MachineLearning

https://www.reddit.com/r/programming/comments/1oc3ond/sinkhornknopp_algorithm_like_softmax_but_for/

📝💤 "Behold, the 'brief' intro to optimal transport where intuition triumphs over 'maths' because who needs rigor? 🙄 It's basically a #YouTube rabbit hole disguised as a blog, because nothing says 'understandable' like suggesting you watch a four-year-old lecture series. 📚📺"
https://alexhwilliams.info/itsneuronalblog/2020/10/09/optimal-transport/ #optimaltransport #rabbitHole #blogpost #mathintuition #lectureSeries #HackerNews #ngated
A Short Introduction to Optimal Transport and Wasserstein Distance · Its Neuronal

A Short Introduction to Optimal Transport and Wasserstein Distance · Its Neuronal

https://kantorovich.org/
As the site itself says "The Kantorovich Initiative is dedicated towards research and dissemination of modern mathematics of optimal transport towards a wide audience of researchers, students, industry, policy makers and the general public."
#optimaltransport #shape #geometry #mathematics
The Kantorovich Initiative

The mission of the Kantorovich Initiative...

The Kantorovich Initiative

The #Wasserstein distance (#EMD), sliced Wasserstein distance (#SWD), and the #L2norm are common #metrics used to quantify the ‘distance’ between two distributions. This tutorial compares these three metrics and discusses their advantages and disadvantages.

🌎 https://www.fabriziomusacchio.com/blog/2023-07-26-wasserstein_vs_l2_norm/

#OptimalTransport #MachineLearning

Comparing Wasserstein distance, sliced Wasserstein distance, and L2 norm

In machine learning, especially when dealing with probability distributions or deep generative models, different metrics are used to quantify the ‘distance’ between two distributions. Among these, the Wasserstein distance (EMD), sliced Wasserstein distance (SWD), and the L2 norm, play an important role. Here, we compare these metrics and discuss their advantages and disadvantages.

Fabrizio Musacchio

This tutorial takes a different approach to explain the #Wasserstein distance (#EMD) by approximating the #EMD with cumulative distribution functions (#CDF), providing a more intuitive understanding of the metric.

🌎 https://www.fabriziomusacchio.com/blog/2023-07-24-wasserstein_distance_cdf_approximation/

#OptimalTransport

Approximating the Wasserstein distance with cumulative distribution functions

In the previous two posts, we’ve discussed the mathematical details of the Wasserstein distance, exploring its formal definition, its computation through linear programming and the Sinkhorn algorithm. In this post, we take a different approach by approximating the Wasserstein distance with cumulative distribution functions (CDF), providing a more intuitive understanding of the metric.

Fabrizio Musacchio

Calculating the #Wasserstein distance (#EMD) 📈 can be computational costly when using #LinearProgramming. The #Sinkhorn algorithm provides a computationally efficient method for approximating the EMD, making it a practical choice for many applications, especially for large datasets 💫. Here is another tutorial, showing how to solve #OptimalTransport problem using the Sinkhorn algorithm in #Python 🐍

🌎 https://www.fabriziomusacchio.com/blog/2023-07-23-wasserstein_distance_sinkhorn/

Wasserstein distance via entropy regularization (Sinkhorn algorithm)

Calculating the Wasserstein distance can be computational costly when using linear programming. The Sinkhorn algorithm provides a computationally efficient method for approximating the Wasserstein distance, making it a practical choice for many applications, especially for large datasets.

Fabrizio Musacchio

The #Wasserstein distance 📐, aka Earth Mover’s Distance (#EMD), provides a robust and insightful approach for comparing #ProbabilityDistributions 📊. I’ve composed a #Python tutorial 🐍 that explains the #OptimalTransport problem required to calculate EMD. It also shows how to solve the OT problem and calculate the EMD using the Python Optimal Transport (POT) library. Feel free to use and share it 🤗

🌎 https://www.fabriziomusacchio.com/blog/2023-07-23-wasserstein_distance/

Wasserstein distance and optimal transport

The Wasserstein distance, also known as the Earth Mover’s Distance (EMD), provides a robust and insightful approach for comparing probability distributions and finds application in various fields such as machine learning, data science, image processing, and information theory. In this post, we take a look at the optimal transport problem, required to calculate the Wasserstein distance, and how to calculate the distance metric in Python.

Fabrizio Musacchio

#OptimalTransport: Moving stuff through a #labyrinth

(Nicolas Papadakis: Optimal Transport for Image Processing, Signal and Image Processing. Université
de Bordeaux; Habilitation thesis, 2015. tel-01246096v8, 2007)

Our Pick of the week: Phuong-Hang Le et al., "Pre-training for Speech Translation: CTC Meets Optimal Transport"
by @mgaido91

 https://arxiv.org/abs/2301.11716

#NLProc #optimaltransport #CTC #speechtranslation

Pre-training for Speech Translation: CTC Meets Optimal Transport

The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design. We provide a quantitative comparison with the more common cross-entropy loss, showing that pre-training with CTC consistently achieves better final ST accuracy. Nevertheless, CTC is only a partial solution and thus, in our second contribution, we propose a novel pre-training method combining CTC and optimal transport to further reduce this gap. Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space. Extensive experiments on the standard CoVoST-2 and MuST-C datasets show that our pre-training method applied to the vanilla encoder-decoder Transformer achieves state-of-the-art performance under the no-external-data setting, and performs on par with recent strong multi-task learning systems trained with external data. Finally, our method can also be applied on top of these multi-task systems, leading to further improvements for these models.

arXiv.org