Could an #LLM with the same encoder/token scheme as a #diffusers image model be used to increase the diffuser model's understanding of concepts? The LLM is going to have greater understanding as a whole than the limited connections that the diffuser is going to pick up on if trained only from captions.