Say hello to #InstructPix2Pix - the #DeepLearning model that edits images based on human instructions!

Trained on synthetic data, it outperforms baseline AI image-editing models.

Discover the magic of InstructPix2Pix on #InfoQ: https://bit.ly/44EO2B1

#AI #ML #ComputerVision

Berkeley Open-Sources AI Image-Editing Model InstructPix2Pix

Researchers from the Berkeley Artificial Intelligence Research (BAIR) Lab have open-sourced InstructPix2Pix, a deep-learning model that follows human instructions to edit images. InstructPix2Pix was trained on synthetic data and outperforms a baseline AI image-editing model.

InfoQ

Just found out it's possible to merge the #InstructPix2Pix- with the #riffusion-model by using the receipe in the image of this post.

And the most interesting part here is the resulting instructPix2Pix-riffusion-model indeed still only outputs spectograms however the results are otherwise not that good (I guess the reason is the GPT3-component of instructPix2Pix was not optimized for spectograms) but it's still interesting that this merger kinda works.

#StableDiffusion

#instructPix2Pix (ip2p) is finally working again in the most recent version of the automatic1111-webui.

Just update your webui-installation with git pull and then your ip2p-extension using the Extensions-tab of the webui.

If you had a depth-model loaded before switch to your ip2p-model and then restart the webui to ensure that the depth-model is unloaded to avoid out-of-memory-errors while using ip2p.

#StableDiffusion

Update: oh well, so it really turned out that #instructpix2pix can actually even run on systems with 6GB of VRAM and doesn't need 18GB of VRAM. This is huge news.

We now have proof that a #StableDiffusion-model with a LLM-component can indeed also run locally on ordinary PCs. I think it's now only a matter of time until it's also possible to run a full LLM on 6GB of VRAM or less.

So there's now another very interesting new #StableDiffusion-model out there named #InstructPix2Pix or to be more precise a model that is a merge of Stable Diffusion with a version of #GPT3 meaning with a large language model(!). (https://huggingface.co/timbrooks/instruct-pix2pix/tree/main).

This model modifies images using prompts with natural language you would also similarly use in ChatGPT (prompts like f.e. "What would it look like with rain?", "Add fireworks", etc).

timbrooks/instruct-pix2pix at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

» Replacing clothes using AI in video. #InstructPix2Pix #EbSynth : #StableDiffusion
https://t.co/k71pfP0PDh
Replacing clothes using AI in video. InstructPix2Pix + EbSynth

Posted in r/StableDiffusion by u/SweetEliz • 134 points and 13 comments

reddit