Mastodawn

J. de Curtò Nov 22, 2022

New preprint: Signature and Log-signature for the Study of Empirical Distributions Generated with GANs #GenerativeAdversarialNetworks #SignatureTransform
https://arxiv.org/abs/2203.03226

Signature and Log-signature for the Study of Empirical Distributions Generated with GANs

In this paper, we bring forward the use of the recently developed Signature Transform as a way to measure the similarity between image distributions and provide detailed acquaintance and extensive evaluations. We are the first to pioneer RMSE and MAE Signature, along with log-signature as an alternative to measure GAN convergence, a problem that has been extensively studied. We are also forerunners to introduce analytical measures based on statistics to study the goodness of fit of the GAN sample distribution that are both efficient and effective. Current GAN measures involve lots of computation normally done at the GPU and are very time consuming. In contrast, we diminish the computation time to the order of seconds and computation is done at the CPU achieving the same level of goodness. Lastly, a PCA adaptive t-SNE approach, which is novel in this context, is also proposed for data visualization.

arXiv.org

J. de Curtò Nov 19, 2022

New preprint: Summarization of Videos with the Signature Transform #TechRxiv #VideoSummarization #LargeLanguageModels #VisualLanguageModels #SignatureTransform
https://doi.org/10.36227/techrxiv.21546900.v1

Summarization of Videos with the Signature Transform

This manuscript proposes a new benchmark to assess the goodness of visual summaries without the necessity of human annotators. It is based on the Signature Transform, specifically on RMSE and MAE Signature and Log-Signature, and builds on the assumption that uniform random sampling can provide accurate summarization capabilities. First, we introduce a preliminary baseline for automatic video summarization, which has at its core a Vision Transformer, an image-text model pre-trained with contrastive learning (CLIP), as well as a module of object detection. Our baseline leverages video text descriptions to determine the most frequent nouns to use as anchors, and then it performs an open-vocabulary image search on the video frames. This enables a zero-shot text-conditioned object detection to select the frames for the final video summary. Despite not needing any proper fine-tuning, our approach provides accurate summaries on a wide range of video data. Since there are not many datasets available for this task, a new dataset consisting of videos from Youtube and the corresponding automatic audio transcriptions is provided. Then, a state-of-the-art accurate technique based on the harmonic components that the Signature Transform is able to capture, and that achieves compelling accuracy and outperforms previous methodologies, is proposed. The analytical measures are extensively evaluated, and we can conclude that correlate very well with the notion of a good summary.

figshare