Hidéo Snes

28 Followers
105 Following
465 Posts
KI, Kunst, Social responsive design & IT.
💛💜 Enby-Ace (np¦name), DnD

RT @osanseviero
💫StarCoder: May the Source be with You!

🔥15B LLM with 8k context
🥳Trained on permissively-licensed code
💻Acts as tech assistant
🤯80+ programming languages
🚀Open source and data
💫Online demos
🧑‍💻VSCode plugin
🪅1 trillion tokens

Follow this amazing experience! 🧵

RT @SmokeAwayyy
❗Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI

https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

Google "We Have No Moat, And Neither Does OpenAI"

Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI

SemiAnalysis

RT @_akhaliq
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling

abs: https://arxiv.org/abs/2305.01257

DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling

We introduce DreamPaint, a framework to intelligently inpaint any e-commerce product on any user-provided context image. The context image can be, for example, the user's own image for virtual try-on of clothes from the e-commerce catalog on themselves, the user's room image for virtual try-on of a piece of furniture from the e-commerce catalog in their room, etc. As opposed to previous augmented-reality (AR)-based virtual try-on methods, DreamPaint does not use, nor does it require, 3D modeling of neither the e-commerce product nor the user context. Instead, it directly uses 2D images of the product as available in product catalog database, and a 2D picture of the context, for example taken from the user's phone camera. The method relies on few-shot fine tuning a pre-trained diffusion model with the masked latents (e.g., Masked DreamBooth) of the catalog images per item, whose weights are then loaded on a pre-trained inpainting module that is capable of preserving the characteristics of the context image. DreamPaint allows to preserve both the product image and the context (environment/user) image without requiring text guidance to describe the missing part (product/context). DreamPaint also allows to intelligently infer the best 3D angle of the product to place at the desired location on the user context, even if that angle was previously unseen in the product's reference 2D images. We compare our results against both text-guided and image-guided inpainting modules and show that DreamPaint yields superior performance in both subjective human study and quantitative metrics.

arXiv.org

RT @amygoodchild
Looking to blend colors in your generative artwork? I've explored a few methods for crafting the ideal gradient.

Check out the comparisons
https://editor.p5js.org/amygoodchild/sketches/T2jm6DBz9

Press 'c' to cycle through different colour pairs, or add your own hex codes to the array.

p5.js Web Editor

A web editor for p5.js, a JavaScript library with the goal of making coding accessible to artists, designers, educators, and beginners.

RT @_akhaliq
Unlimiformer: Long-Range Transformers with Unlimited Length Input

demonstrate Unlimiformer’s efficacy on several long-document and multi-document summarization benchmarks, showing that it can summarize even 350k token-long inputs from the BookSum dataset, without any input… https://twitter.com/i/web/status/1653577349374304256

AK on Twitter

“Unlimiformer: Long-Range Transformers with Unlimited Length Input demonstrate Unlimiformer’s efficacy on several long-document and multi-document summarization benchmarks, showing that it can summarize even 350k token-long inputs from the BookSum dataset, without any input…”

Twitter

RT @Oranguerillatan
Tests with the latest #warpfusion are coming along nicely, numerous #ControlNet flavours mixed in.

Kudos to @devdef always for his incredible tool and generous support community on Discord 🦧🦍🙏

My GPU survived the warp-ocalypse!💥

#GenerativeAI #AIArtistCommunity #AIArtworkshttps://twitter.com/i/web/status/1653493879956557834

Oranguerillatan on Twitter

“Tests with the latest #warpfusion are coming along nicely, numerous #ControlNet flavours mixed in. Kudos to @devdef always for his incredible tool and generous support community on Discord 🦧🦍🙏 My GPU survived the warp-ocalypse!💥 #GenerativeAI #AIArtistCommunity #AIArtworks…”

Twitter

RT @_akhaliq
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

abs: https://arxiv.org/abs/2305.00787
project page: https://genefaceplusplus.github.io/

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Generating talking person portraits with arbitrary speech audio is a crucial problem in the field of digital human and metaverse. A modern talking face generation method is expected to achieve the goals of generalized audio-lip synchronization, good video quality, and high system efficiency. Recently, neural radiance field (NeRF) has become a popular rendering technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video. However, there still exist several challenges for NeRF-based methods: 1) as for the lip synchronization, it is hard to generate a long facial motion sequence of high temporal consistency and audio-lip accuracy; 2) as for the video quality, due to the limited data used to train the renderer, it is vulnerable to out-of-domain input condition and produce bad rendering results occasionally; 3) as for the system efficiency, the slow training and inference speed of the vanilla NeRF severely obstruct its usage in real-world applications. In this paper, we propose GeneFace++ to handle these challenges by 1) utilizing the pitch contour as an auxiliary feature and introducing a temporal loss in the facial motion prediction process; 2) proposing a landmark locally linear embedding method to regulate the outliers in the predicted motion sequence to avoid robustness issues; 3) designing a computationally efficient NeRF-based motion-to-video renderer to achieves fast training and real-time inference. With these settings, GeneFace++ becomes the first NeRF-based method that achieves stable and real-time talking face generation with generalized audio-lip synchronization. Extensive experiments show that our method outperforms state-of-the-art baselines in terms of subjective and objective evaluation. Video samples are available at https://genefaceplusplus.github.io .

arXiv.org
proximasanfinetuning/luna-diffusion-v2.5 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

RT @_akhaliq
ArK: Augmented Reality with Knowledge Interactive Emergent Ability

abs: https://arxiv.org/abs/2305.00970
project page: https://augmented-reality-knowledge.github.io/

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e.g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK), which leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments. The knowledge interactive emergent ability (Figure 1) is demonstrated as the observation learns i) micro-action of cross-modality: in multi-modality models to collect a large amount of relevant knowledge memory data for each interaction task (e.g., unseen scene understanding) from the physical reality; and ii) macro-behavior of reality-agnostic: in mix-reality environments to improve interactions that tailor to different characterized roles, target variables, collaborative information, and so on. We validate the effectiveness of ArK on the scene generation and editing tasks. We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes, compared to baselines, demonstrating the potential benefit of incorporating ArK in generative AI for applications such as metaverse and gaming simulation.

arXiv.org
RT @StarkMakesArt
🧵on Pose Stealing - A Three Step Workflow by Me. Ever want to make a specific pose with a specific character in #midjourney without burning fast time in the casino to maybe get to a decent result? This is the thread for you!