Mastodawn

Jan Eggers Jun 27, 2023

Okay - Auch #stablediffusion #sdxl (nur echt mit den zwei CLIP-Prozessoren!) hat gewisse Probleme mit den Relationen zwischen den Objekten: "A robot painting a unicorn".

(Mehr über die "Einhorn-Challenge", und was wir daraus über die #ki Bildgeneratoren #stablediffusion, #deepfloyd, #dalle2 und #midjourney lernen: https://www.janeggers.tech/eeblog/2023/ki-tabu-runde-2-einhoerner-malende-roboter-sind-schwierig/)

KI-Tabu, Runde 2: Einhörner malende Roboter sind schwierig! | janeggers.tech - The future's so bright, you gotta wear shades

Schwierig: KI-Bildgenerator überreden, zwei Objekte in die richtige Beziehung zu setzen - weshalb Deepfloyd dafür besser ist als Midjourney

janeggers.tech - The future's so bright, you gotta wear shades

Curious Future Jun 2, 2023

Text to picture now available #deepfloyd

Show thread

Colin M. Ford May 5, 2023

Anyway you can try it out for yourself here, best of luck https://huggingface.co/spaces/DeepFloyd/IF
#deepfloyd #aiart

IF - a Hugging Face Space by DeepFloyd

Discover amazing ML apps made by the community

Fahim Farook Apr 30, 2023

Decided to give DeepFloyd a try today on macOS.

The good news? It works … kinda 😛

The bad news? It doesn’t work all the way … as was to be expected 🙂

I took the following code from their GitHub repo (https://github.com/deep-floyd/IF) and modified for an Apple Silicon (M1) Mac. Here’s the actual code I ran:

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0").to("mps")

# stage 2
stage_2 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-II-M-v1.0", text_encoder=None).to("mps")

# stage 3
safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules).to("mps")

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

# text embeds
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

generator = torch.manual_seed(0)

# stage 1
image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
pt_to_pil(image)[0].save("./if_stage_I.png")

# stage 2
image = stage_2(image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
pt_to_pil(image)[0].save("./if_stage_II.png")

# stage 3
image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
image[0].save("./if_stage_III.png")

You have to make sure that diffusers, transformers, and accelereate (at least in my own trial) are fully up-to-date. The larger models probably work too but it took too long to download/test and so I opted for the smallest models.

Stage I and II generated images but stage III errored out. I will need to figure out what happened there later …

Resulting images are attached …

#DeepLearning #MachineLearning #DeepFloyd #ImageGeneration

Stage I image — 64 x 64 in size…
Stage II image — 256 x 256 in s…