Mastodawn

1/20th of civitai user prompts added as 300Mb safetensor file to the CLIP interrogator

https://lemmy.world/post/20812652

1/20th of civitai user prompts added as 300Mb safetensor file to the CLIP interrogator - Lemmy.World

Image shows list of prompt items before/after running ‘remove duplicates’ from a subset of the Adam Codd huggingface repo of civitai prompts: https://huggingface.co/datasets/AdamCodd/Civitai-2m-prompts/tree/main [https://huggingface.co/datasets/AdamCodd/Civitai-2m-prompts/tree/main] Link to notebook here: https://huggingface.co/datasets/codeShare/fusion-t2i-generator-data/blob/main/Google Colab Jupyter Notebooks/fusion_t2i_CLIP_interrogator.ipynb [https://huggingface.co/datasets/codeShare/fusion-t2i-generator-data/blob/main/Google%20Colab%20Jupyter%20Notebooks/fusion_t2i_CLIP_interrogator.ipynb] //—// Removing duplicates from civitai prompts results in a 90% reduction of items! From 4.8 million-> 0.417 million items. If you wish to search this set , you can use the notebook above. Unlike the typical pharmapsychotic CLIP interrogator , I pre-encode the text corpus ahead of time. Additionally , I’m using quantization on the text corpus to store the encodings as unsigned integers (torch.uint8) instead of float32 , using this formula: [https://lemmy.world/pictrs/image/6c03bb16-66f1-458e-8124-f68ec8ef1f01.png] For the clip encodings , I use scale 0.0043. The TLDR is that you divide the float32 value with 0.043 , round it up to the closest integer , and then add the zero_point until all values within the encoding is above 0. A typical zero_point value for a given encoding can be 0 , 30 , 120 or 250-ish. In summary , a pretty useful setup for me when I need prompts for stuff. //—// I also have a 1.6 million item fanfiction set of tags loaded from https://archiveofourown.org/ [https://archiveofourown.org/] Its mostly character names. They are listed as fanfic1 and fanfic2 respectively. //—// Upcoming plans is to include a visual representation of the text_encodings as colored cells within a 16x16 grid. A color is an RGB value (3 integer values) within a given range , and 3 x 16 x 16 = 768 , which happens to be the dimension of the CLIP encoding //—// Thats all for this update.