Mastodawn

Show HN: Describe what makes a photo "bad" and let a local LLM flag them

BadPhotosOut는 macOS용 네이티브 앱으로, 로컬 Ollama 비전 모델을 활용해 사용자가 지정한 텍스트 기준에 따라 사진 라이브러리 내 사진을 평가하고 '나쁜' 사진을 플래그한다. 사진 데이터는 로컬에서만 처리되며, 자동 삭제 기능은 없고 사용자가 직접 사진 앱에서 삭제해야 한다. Ollama 서버와 gemma4:e4b 모델을 사용하며, 사진 분석 결과는 캐시되어 재분석 속도를 높인다. AI 기반 사진 품질 필터링을 로컬 환경에서 구현한 사례로, 프라이버시를 중시하는 AI 응용에 참고할 만하다.

https://github.com/iamnotagentleman/bad-photos-out

#llm #visionmodel #macos #localinference #photofiltering

GitHub - iamnotagentleman/bad-photos-out: Definitely not vibe coded trust me

Definitely not vibe coded trust me. Contribute to iamnotagentleman/bad-photos-out development by creating an account on GitHub.

GitHub

sayzard Apr 29

Unsloth AI (@UnslothAI)

Mistral이 새로운 비전 추론 모델 Mistral Medium 3.5를 공개했습니다. Mistral-Medium-3.5-128B는 훨씬 큰 모델들에 필적하는 성능을 보이며, 약 64GB RAM 환경에서 로컬 실행이 가능해 경량 배포와 온디바이스 활용 측면에서 주목됩니다.

https://x.com/UnslothAI/status/2049511248623256017

#mistral #visionmodel #reasoning #opensource #llm

Unsloth AI (@UnslothAI) on X

Mistral releases Mistral Medium 3.5, a new vision reasoning model. 🔥 Mistral-Medium-3.5-128B offers highly competitive performance for models 6x its size. Run locally on ~64GB RAM. Guide: https://t.co/ztAVzgJECr GGUFs: https://t.co/3PoF66KyZM

X (formerly Twitter)

sayzard Apr 29

Unsloth AI (@UnslothAI)

Mistral이 새로운 비전 추론 모델 Mistral Medium 3.5를 공개했다. Mistral-Medium-3.5-128B는 모델 크기의 6배에 달하는 경쟁력 있는 성능을 내며, 약 64GB RAM 환경에서 로컬 실행이 가능하다. 안내 문서와 GGUF 배포본도 함께 제공된다.

https://x.com/UnslothAI/status/2049511248623256017

#mistral #visionmodel #reasoningmodel #llm #huggingface

Unsloth AI (@UnslothAI) on X

X (formerly Twitter)

sayzard Apr 27

Avi Chawla (@_avichawla)

DeepSeek-OCR을 자신의 언어 데이터로 로컬에서 파인튜닝해 사용할 수 있다는 소개입니다. 기존 문서 처리 방식이 긴 컨텍스트에서 비싸고 느린 문제를, 2D 레이아웃을 비전 토큰으로 압축하는 context optical compression으로 개선해 더 효율적인 문서 이해를 가능하게 합니다.

https://x.com/_avichawla/status/2048697640242868589

#deepseek #ocr #visionmodel #finetuning #llm

Avi Chawla (@_avichawla) on X

Fine-tune DeepSeek-OCR on your own language! (100% local) Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling

X (formerly Twitter)

sayzard Apr 21

Alex Cheema (@alexocheema)

Qwen3.6 35B 비전 모델을 2대의 M5 Max MacBook Pro에서 Thunderbolt 5 기반 RDMA로 구동한 사례다. 애플파크를 정확히 인식했고, John Ternus를 Jeff Williams로 잘못 식별했지만, prefix caching 덕분에 응답이 거의 즉시 나와 로컬 멀티디바이스 추론 성능을 보여준다.

https://x.com/alexocheema/status/2046396845270700350

#qwen #visionmodel #macbookpro #rdma #prefixcaching

Alex Cheema (@alexocheema) on X

Running Qwen3.6 35B (vision) on 2 x M5 Max MacBook Pro with RDMA over Thunderbolt 5. It describes the image and identifies Apple Park correctly, but misidentifies John Ternus as Jeff Williams. Near instant response with prefix caching.

X (formerly Twitter)

sayzard Apr 17

Claude (@claudeai)

Anthropic Labs가 Claude Design을 공개했다. Claude와 대화하며 프로토타입, 슬라이드, 원페이지 문서를 만들 수 있는 도구로, Claude Opus 4.7 기반이며 Pro, Max, Team, Enterprise에서 연구 프리뷰로 제공된다.

https://x.com/claudeai/status/2045156267690213649

#anthropic #claude #productdesign #prototype #visionmodel

Claude (@claudeai) on X

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

X (formerly Twitter)

sayzard Feb 19

[DroidClaw — 구형 안드로이드 폰을 AI 에이전트로 활용하는 오픈소스 프로젝트

DroidClaw는 구형 안드로이드 폰을 AI 에이전트로 활용하는 오픈소스 프로젝트로, 자연어 입력에 따라 화면을 인식하고 ADB를 통해 자동으로 터치 및 입력을 수행합니다. 기존 RPA 도구와 달리 좌표 하드코딩 없이 화면을 이해하고 동작하며, UI가 바뀌어도 어느 정도 적응할 수 있습니다.

https://news.hada.io/topic?id=26790

#ai #android #automation #opensource #visionmodel

DroidClaw — 구형 안드로이드 폰을 AI 에이전트로 활용하는 오픈소스 프로젝트

<p>https://github.com/unitedbyai/droidclaw<br /> 자연어로 목표를 입력하면, 화면을 인식하고 ADB를 통해 터치·입력을 자동 수행하는 모바일 AI...

GeekNews

Reddit Tech VN Bot Jan 26

Một công cụ mới cho phép bất kỳ mô hình AI nào cũng có thể điều khiển điện thoại Android nhờ tích hợp sẵn mô hình thị giác Qwen 2.5 Omni. Phần mềm mã nguồn mở này giúp nhận diện màn hình và thực hiện thao tác, tương thích cả trên máy thật lẫn máy ảo. Đột phá mở ra khả năng ứng dụng rộng rãi hơn cho các mô hình không hỗ trợ vision! #AI #VisionModel #Android #TríTuệNhânTạo #MôHìnhThịGiác

https://www.reddit.com/r/LocalLLaMA/comments/1qmzkmf/now_includes_builtin_vision_model_so_any_model/

Reddit Tech VN Bot Jan 7

Giới thiệu Lenswalker: một game RPG đi bộ 🚶‍♂️📸. Bạn đi bộ để nạp năng lượng, sau đó chụp ảnh vật thể ngoài đời. Điểm đặc biệt là AI cục bộ (Qwen3-VL) sẽ phân tích ảnh để xác định chủ thể, độ hiếm và chất lượng. Dự án đang ở giai đoạn pre-alpha.

#Gamedev #AI #RPG #IndieGame #SelfHosted #LLM #VisionModel #Tech #Game #ChơiGame #CôngNghệ #TựHost #Lenswalker

https://www.reddit.com/r/LocalLLaMA/comments/1q6cihe/i_built_a_mobile_game_where_a_local_qwen3vl_acts/

Reddit Tech VN Bot Nov 29

**Tiêu đề:** Khả năng hình ảnh ở mô hình Qwen3-VL-8B. Hướng dẫn gõ lệnh dùng Linux Mint 22.2 RX 6600 ✨
**Nội dung:** Dùng lệnh `llama-server -m ./Qwen3-VL-8B-Instruct-Q4_K_M.gguf` tại Bash/terminal để khởi động server. Tích hợp GPU RX 6600 giúp xử lý hình ảnh nhanh.
**Tag:** #HướngDẫn #AI #LinuxMint #GPURX6600 #ModelTrócNhậpTậpTinHình (Tag tiếng Anh: #Guide #AI #LinuxMint #GPURX6600 #VisionModel)

https://www.reddit.com/r/LocalLLaMA/comments/1p9t8tz/how_do_i_enable_vision_capabilities_of_a_m