Một người dùng đang tìm kiếm model AI địa phương chất lượng cao cho nội dung NSFW, chạy trên 4060 Ti (8GB VRAM) và 32GB RAM qua Oobabooga. Các model đã thử hoặc quá chậm (Llama_3.x_70b) hoặc chất lượng chưa đạt yêu cầu của DeepSeek nhưng không bị kiểm duyệt. Ai có gợi ý model nào phù hợp không?
#AILocal #NSFWModel #4060Ti #Oobabooga #LLM #AIViệtNam

https://www.reddit.com/r/LocalLLaMA/comments/1nyqrjd/looking_for_a_highquality_local_nsfw_model_4060/

Have updated my local #LLM server from #Mistral 8B to #Qwen3 30B A3B. Still very fast. "Thinking" model so it tends to follow instructions and handle prompts with copious amounts of data better (#HomeAssistant reports).

This is all done on CPU, too. 20 cores, but still. If you have #HomeLab equipment needing a use, I highly recommend setting up #oobabooga and slapping one of these bad bois in there so you can call it via API in your other projects.

@Mawoka @kellogh I run #Mistral #7B Instruct via CPU on a #Debian #Linux server via #API through #oobabooga. I didn't need realtime responses, and I had spare CPU power, so it worked out perfectly.

Are there any ‘multimodal’ #AI frontends for #SelfHosting out there? Currently I use #oobabooga, which is really cool but it’s text-only.

I mean like something similar to the new GPT-4o, which lets you upload pdfs, photos, access websites and etc on the chat.

Also, would the models need to be prepared for this or are the same models usable?

In this guide I explain how to use SillyTavern on Arch Linux locally: https://spacebums.co.uk/sillytavern/

#sillytavern #AI #archlinux #guide #oobabooga #roleplay #linux

SillyTavern

In this guide, I’ll cover the process which I used to download and install all the necessary requirements to run SillyTavern on Arch Linux. If you have a moderately powerful PC and an NVIDIA GPU, then, by the end of this guide, you will be able to use your microphone to do voice chat with an AI character, then have them respond back with a lifelike voice of their own. You can go on virtual adventures together and even have more than one AI character active, so they can not only talk to you - but with each other!

Hey fellow #AI / #LLM #nerds, I've discovered a weird issue with some LLMs I've been foolin' around with. I'm using #oobabooga, and Some models seem to just quit when formatting block text.

```markdown
```

For example, the above is what *some* models do. Others don't seem to care. Something about the ``` token makes it give up. Seen anything like this?

I found a really good model!
It’s really impressively good, specially for something I can run at my own machine!

Here’s a link: https://huggingface.co/lmsys/vicuna-13b-v1.5-16k

I don’t have enough VRAM for it in the default settings, but it works just great in 8-bit mode.

They have smaller versions here too: https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md

I tried the 13b and 13b16k models, the 13b16k seems to understand my requests slightly better so that’s what I’ll use for now!

(also I just noticed that it’s only 24GB in storage? Wtf?)

It’s WAY better than anything else I’ve ever tried running locally, so I’m really impressed by it! It’s actually usable and pretty great!

(If you know of any even better models that can run on a 4090, do let me know!)

#AI #LLM #GPT #GPT4All #oobabooga #huggingface

lmsys/vicuna-13b-v1.5-16k · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

What should one consider when choosing a model in #Huggingface?

Just the number of parameters and the filesize (the bigger these two are the better the model is)?

Which would be the best model in there? I tried running some models locally but they’re so much worse (“dumber”) than ChatGPT, I wanted to run something closer to it. Maybe the models I chose just weren’t big enough.

What is the “smarter”/biggest model in there?

I have an empty ssd I could use if the model is really big, and I have a good gpu so that’s not a problem either as long as the model doesn’t need like 10 gpus to run

Edit: okay apparently some models need like 300GB of VRAM to run, so lemme ask this differently:

What is the biggest, best model I can run on a RTX 4090? (24GB VRAM)

(64GB RAM)

Edit 2: apparently there’s this bitsandbytes thing that could help?

(Probably irrelevant but I’ll use the #oobabooga webUI)

#AI #LLM #GPT #GPT4all

I went to try Mixtral 8x7b on my gpus 4 x 24GB Telsa p40's

https://mistral.ai/news/mixtral-of-experts/

I loads the whole thing to my GPUs, all ready everything set,
type "hello AI how are you?" into ooba booga text generator
click SEND

**screens go black
**computer starts beeping like cray cray

⠀⠀⠀⠀⠀⠀⠀⣀⣤⣶⡶⠶⠶⠦⣄⠀⠀
⠀⠀⠀⠀⠀⢠⣾⡿⠋⠁⠀⠀⠀⠀⡨⠂⠀⠀⠀⠀
⠀⠀⠀⠀⢀⣾⣿⣧⠀⢠⠔⣐⡀⠈⠶⠶⠄⠀⠀⠀
⠀⠀⠀⠀⢸⣿⣿⣿⡄⠀⠈⠙⠁⡆⠑⢠⠈⡀⠀⠀
⠀⠀⠀⠀⠈⢿⣿⣍⠃⠀⠀⠀⣀⠀⠁⡀⠁⣇⣀⣀
⠀⣠⣤⣶⣾⣿⣿⣿⣭⠀⠀⠀⠀⠈⠉⠁⠀⣿⣿⣿
⠿⠿⠿⠿⠿⠿⠿⠿⠖⠌⠐⠂⠀⢀⡀⠠⠿⠿⠿⠿

#LLM #AI #oobabooga #Mixtral

Mixtral of experts

A high quality Sparse Mixture-of-Experts.

LLM for GTX1080 (8GB) for local use.

https://lemmy.dbzer0.com/post/7095567

LLM for GTX1080 (8GB) for local use. - Divisions by zero

Could someone recommend a LLM for the Nvidia GTX1080? I’ve used the gptq_model-4bit-128g of Luna AI from the Bloke and i get a response every 30s-60s and only 4-5 prompts before it starts to repeat or hallucinate.