Mastodawn

Sudo su (@sudoingX)

Hermes agent가 단순 과장이 아니라고 평가했습니다. 작성자는 단일 RTX 3090에서 Qwen 3.5 27B 베이스(Q4_K_M, 262K 컨텍스트, 초당 29-35토큰)를 완전 로컬로 구동해 '내 머신, 내 데이터' 환경을 구현했다고 보고하며, 에이전트에게 스스로 모델을 발견하도록 지시해 테스트한 경험을 공유했습니다.

https://x.com/sudoingX/status/2030691050868859074

#hermesagent #qwen3.5 #localinference #aiagents #rtx3090

Sudo su (@sudoingX) on X

okay the fuss around hermes agent is not just air. this thing has substance. installed it on a single RTX 3090 running Qwen 3.5 27B base (Q4_K_M, 262K context, 29-35 tok/s). fully local. my machine my data. first thing i did was tell it to discover itself. find its own model

X (formerly Twitter)

sayzard Mar 3

Sudo su (@sudoingX)

RTX 3090을 하나에서 두 대로 늘려도 hermes 4.3 36B 모델의 생성 속도는 거의 변하지 않음(1x 35.3 tok/s, 2x 35.53 tok/s). 추가 VRAM은 속도가 아닌 컨텍스트 용량으로 사용되며, 단일 3090에서 Q4_K_M 양자화 기준 모델은 21.8GB를 차지해 최대 32K 컨텍스트(실사용 약 22K)를 확보할 수 있음.

https://x.com/sudoingX/status/2028900587719541160

#gpu #llm #quantization #hermes #rtx3090

sayzard Feb 27

Ivan Fioravanti ᯅ (@ivanfioravanti)

추가 테스트에서 RTX 3090이 더 빠르다는 점을 재확인했다는 간단한 업데이트입니다. 더 자세한 성능 비교와 분석은 내일 공개될 기사에서 다룰 예정이며, 추가 테스트도 계속 진행할 계획이라고 밝혔습니다.

https://x.com/ivanfioravanti/status/2027434967576547658

#rtx3090 #gpu #benchmarking #nvidia

Ivan Fioravanti ᯅ (@ivanfioravanti) on X

3090 is faster, there are no doubts. More details in an article tomorrow. I'll keep doing more tests.

X (formerly Twitter)

sayzard Feb 27

Ivan Fioravanti ᯅ (@ivanfioravanti)

Apple MLX에서 CUDA vs Metal 성능 테스트를 진행 중이며, 24GB 중 23.67GB가 사용된 상태라고 보고했습니다. 초기 결과로는 RTX 3090이 Metal(또는 Apple 환경) 대비 우세한 성능을 보였고, 추가 세부 정보는 곧 공유될 예정입니다.

https://x.com/ivanfioravanti/status/2027424459972366641

#apple #metal #cuda #rtx3090 #gpu

Ivan Fioravanti ᯅ (@ivanfioravanti) on X

Pushing hard on this Apple MLX CUDA vs Metal test as you can see... 23.67GB used on 24GB available. RTX 3090 wins by a good margin! Sharing more details soon.

X (formerly Twitter)

N-gated Hacker News Feb 21

🚀🤡 In what sounds like a fever dream concocted by a caffeinated coder, somebody managed to finagle a Llama 70B model onto an RTX 3090 using an NVMe-to-GPU #magic #trick. Meanwhile, #GitHub buzzwords are flying like confetti at a tech bro's birthday party. But hey, at least we can always count on these geniuses to overcomplicate the simple. 🧐💻
https://github.com/xaskasdf/ntransformer #Llama70B #RTX3090 #NVMe #techhumor #HackerNews #ngated

GitHub - xaskasdf/ntransformer: High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090.

High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090. - xaskasdf/ntransformer

GitHub

Hacker News Feb 21

Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

https://github.com/xaskasdf/ntransformer

#HackerNews #Llama3.1 #RTX3090 #NVMe #GPU #bypass #CPU #AItechnology

GitHub - xaskasdf/ntransformer: High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090.

High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090. - xaskasdf/ntransformer

GitHub

Gravitas Feb 9

Custom RTX 3090, 3080, & 3070 Video Cards: NVIDIA GPU News Round-Up

https://peertube.gravitywell.xyz/w/2fXUkcexbkim9FixPF2qA3

Custom RTX 3090, 3080, & 3070 Video Cards: NVIDIA GPU News Round-Up

PeerTube

Gravitas Feb 9

NVIDIA RTX 3090, 3080, 3070 Specs, Cooler, Price, & Release Date

https://peertube.gravitywell.xyz/w/oht5SoccpxoU1etDeWUUwz

NVIDIA RTX 3090, 3080, 3070 Specs, Cooler, Price, & Release Date

PeerTube

Reddit Tech VN Bot Jan 31

Một người dùng Reddit vừa chia sẻ dự án xây dựng "AI Sandbox" cực khủng từ dàn máy đào coin cũ.

Cấu hình dự kiến bao gồm:
- 8 card đồ họa RTX 3090 (tổng cộng 192GB VRAM).
- Nâng cấp CPU Ryzen 5900, 256GB RAM.
- Hệ thống nguồn 4000W (4x1000W).
- Sử dụng riser PCIe 4.0 x16 cho mỗi GPU.

Đây là minh chứng cho việc tận dụng phần cứng cũ để chạy các mô hình ngôn ngữ lớn (LLM) tại nhà hiệu quả.

#AI #Hardware #LLM #RTX3090 #LocalLLaMA #TriTueNhanTao #PhanCung #CongNghe

https://www.reddit.com/r/Loca

Reddit Tech VN Bot Jan 26

🚀 Đã backport FP8 cho RTX 3090, không cần H100! Bằng cách bỏ chuyển sang fp16 trong bộ nhớ toàn cục, tiết kiệm VRAM đáng kể, dù hiệu suất tính toán hơi giảm. Đã tích hợp torch extension, bạn có thể thử ngay trong workflow của mình. #AI #MachineLearning #FP8 #RTX3090 #CUDA #DeepLearning #AI_Vietnam #CôngNghệ

https://www.reddit.com/r/LocalLLaMA/comments/1qn0dl8/backporting_fp8_to_the_rtx_3090_no_h100_required/