March 2026 - A video (https://doi.org/10.1101/649822) of a brain MRI is slit-scanned with color dispersion and fed to the YOLO object recognition model. Activations in YOLO's 7th backbone layer are used to modulate alpha transparency and luminance in a 3D render of the slit-scanned video as a volume (of width x height x time). The accompanying music is made by injecting embeddings from the CLIP image description model running on the resulting video into the conditioning pathway of Facebook's MusicGen generative music model.
Slit-scan video of people moving in Amsterdam Centraal railway station in August 2021, rendered as a volume, the transparency of pixels is determined by activations of a layer of the Ultralytics YOLO model run on the video. The music is generated by activations of a neural layer of OpenAI's CLIP image description model injected into Facebook's MusicGen generative model for 3 second chunks of this video.