Mastodawn

J. de Curtò Feb 27, 2023

Our paper "Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles" has just been accepted for publication in MDPI Drones. Read more about our work here: https://mdpi.com/2504-446X/7/2/114 #UAV #LargeLanguageModels #SceneUnderstanding #Drones

Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles

Unmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. In this work, we built on the use of Large Language Models (LLMs) and Visual Language Models (VLMs), together with a state-of-the-art detection pipeline, to provide thorough zero-shot UAV scene literary text descriptions. The generated texts achieve a GUNNING Fog median grade level in the range of 7–12. Applications of this framework could be found in the filming industry and could enhance user experience in theme parks or in the advertisement sector. We demonstrate a low-cost highly efficient state-of-the-art practical implementation of microdrones in a well-controlled and challenging setting, in addition to proposing the use of standardized readability metrics to assess LLM-enhanced descriptions.

MDPI

J. de Curtò Dec 10, 2022

It's been one year since we married. We went to Lantau to celebrate! @DeZarza

J. de Curtò Nov 25, 2022

A book that changed me: Programming Pearls (Bentley).

J. de Curtò Nov 23, 2022

Good new resources for researchers; actually very useful and cool: https://www.explainpaper.com/
https://elicit.org/

Explainpaper

A better way to read academic papers.

J. de Curtò Nov 22, 2022

New preprint: Signature and Log-signature for the Study of Empirical Distributions Generated with GANs #GenerativeAdversarialNetworks #SignatureTransform
https://arxiv.org/abs/2203.03226

Signature and Log-signature for the Study of Empirical Distributions Generated with GANs

In this paper, we bring forward the use of the recently developed Signature Transform as a way to measure the similarity between image distributions and provide detailed acquaintance and extensive evaluations. We are the first to pioneer RMSE and MAE Signature, along with log-signature as an alternative to measure GAN convergence, a problem that has been extensively studied. We are also forerunners to introduce analytical measures based on statistics to study the goodness of fit of the GAN sample distribution that are both efficient and effective. Current GAN measures involve lots of computation normally done at the GPU and are very time consuming. In contrast, we diminish the computation time to the order of seconds and computation is done at the CPU achieving the same level of goodness. Lastly, a PCA adaptive t-SNE approach, which is novel in this context, is also proposed for data visualization.

arXiv.org

J. de Curtò Nov 20, 2022

Great resource:
Intro to Robotics at Princeton https://www.youtube.com/@intro-to-robotics/videos

Introduction to Robotics @ Princeton

Lectures from "Introduction to Robotics" at Princeton University (MAE/ECE 345, COS 346, MAE 549). Instructor: Anirudha Majumdar (irom-lab.princeton.edu/majumdar). Other course materials (notes, slides, etc.): https://irom-lab.princeton.edu/intro-to-robotics Course description: Robotics is a rapidly growing field with applications including unmanned aerial vehicles, autonomous cars, and robotic manipulators. This course will provide an introduction to the fundamental theoretical and algorithmic principles behind robotic systems. The course will also allow students to get hands-on experience through project-based assignments with the Crazyflie quadrotor. Topics include: Feedback Control Motion Planning State estimation, localization, and mapping Computer vision and learning Broader topics: Robotics and the law, ethics, and economics This course is aimed at undergraduate students (primarily juniors and seniors). The graduate-level track (MAE 549) is aimed at first-year PhD students.

YouTube

J. de Curtò Nov 19, 2022

New preprint: Summarization of Videos with the Signature Transform #TechRxiv #VideoSummarization #LargeLanguageModels #VisualLanguageModels #SignatureTransform
https://doi.org/10.36227/techrxiv.21546900.v1

Summarization of Videos with the Signature Transform

This manuscript proposes a new benchmark to assess the goodness of visual summaries without the necessity of human annotators. It is based on the Signature Transform, specifically on RMSE and MAE Signature and Log-Signature, and builds on the assumption that uniform random sampling can provide accurate summarization capabilities. First, we introduce a preliminary baseline for automatic video summarization, which has at its core a Vision Transformer, an image-text model pre-trained with contrastive learning (CLIP), as well as a module of object detection. Our baseline leverages video text descriptions to determine the most frequent nouns to use as anchors, and then it performs an open-vocabulary image search on the video frames. This enables a zero-shot text-conditioned object detection to select the frames for the final video summary. Despite not needing any proper fine-tuning, our approach provides accurate summaries on a wide range of video data. Since there are not many datasets available for this task, a new dataset consisting of videos from Youtube and the corresponding automatic audio transcriptions is provided. Then, a state-of-the-art accurate technique based on the harmonic components that the Signature Transform is able to capture, and that achieves compelling accuracy and outperforms previous methodologies, is proposed. The analytical measures are extensively evaluated, and we can conclude that correlate very well with the notion of a good summary.

figshare