Mastodawn

The understated loading design inside Transformers that saves memory

Transformers 라이브러리는 PyTorch의 meta device를 활용해 대형 모델을 메모리 두 배 사용 없이 효율적으로 로딩하는 방식을 구현했다. meta device는 파라미터의 메타데이터만 보유해 메모리 사용을 최소화하며, safetensors 슬라이스를 통해 필요한 텐서만 지연 로딩한다. 또한 비동기 및 동기 로딩 경로를 상황에 맞게 선택하고, 디스크 오프로딩을 지원해 메모리 부담을 줄인다. 이러한 설계는 70B 이상의 대형 모델도 제한된 메모리 환경에서 효과적으로 다룰 수 있게 한다.

https://www.stevhliu.com/2026/transformers-compendium-1

#transformers #pytorch #memoryoptimization #modelloading #safetensors

Transformers Compendium - Part 1

A collection of engineering details and design in Transformers.

sayzard Apr 14

leopardracer (@leopardracer)

16GB 메모리로는 35B 모델을 돌리기 어렵다는 기존 인식을 뒤집는 설정 플래그가 등장했다는 내용이다. 대형 언어모델의 로컬 실행과 메모리 최적화에 유용한 기술적 개선으로 보인다.

https://x.com/leopardracer/status/2043979806958596551

#llm #localai #memoryoptimization #35b #aimodel

leopardracer (@leopardracer) on X

Everyone said 16GB isn’t enough for a 35B model. They were right. Until this one flag.

X (formerly Twitter)

N-gated Hacker News Mar 27

🎉 Wow, a groundbreaking realization! 🧠 Memory optimization is back in style because AI hoarders allegedly bought all the #RAM. Who knew? Next up: inventing fire! 🔥
https://nibblestew.blogspot.com/2026/03/everything-old-is-new-again-memory.html #memoryoptimization #AIhoarders #shortage #technews #groundbreakinginnovation #HackerNews #ngated

Everything old is new again: memory optimization

At this point in history, AI sociopaths have purchased all the world's RAM in order to run their copyright infringement factories at full bl...

Hacker News Mar 27

Everything old is new again: memory optimization

https://nibblestew.blogspot.com/2026/03/everything-old-is-new-again-memory.html

#HackerNews #memoryoptimization #nostalgia #techinnovation #dataefficiency #memorymanagement

Everything old is new again: memory optimization

At this point in history, AI sociopaths have purchased all the world's RAM in order to run their copyright infringement factories at full bl...

sayzard Feb 19

Jarred Sumner (@jarredsumner)

Claude Code의 v2.1.47 업데이트에서 장시간 실행되는 코드 세션의 메모리 사용량이 감소했다는 공지입니다. 개선은 @cirospaciari의 기여로 이루어졌으며, 사용자는 문제를 계속 보고해 달라는 안내가 포함되어 있습니다. 개발자 툴 성능 향상에 관한 중요한 마이너 업데이트 소식입니다.

https://x.com/jarredsumner/status/2024289291879534793

#claude #memoryoptimization #release #developertools

Jarred Sumner (@jarredsumner) on X

Long-running Claude Code sessions use less memory in v2.1.47, thanks to @cirospaciari Keep reporting issues and the team will fix

X (formerly Twitter)

sayzard Feb 6

Vali Neagu (@AmbsdOP)

원래 Gradio 버전과 동일한 API 호출을 사용하고 있으나, cover 기능에서 Apple 기기에서 메모리 사용량이 높게 나타나는 문제를 발견했습니다. 현재 메모리 최적화에 집중 중이며 곧 PR을 올릴 예정이라고 알렸습니다. 개발자용 툴의 성능 개선 관련 진행 상황을 공유하는 업데이트입니다.

https://x.com/AmbsdOP/status/2019503866929164666

#gradio #memoryoptimization #apple #api #pullrequest

Vali Neagu (@AmbsdOP) on X

@joanplanas @cocktailpeanut We are doing the same API call as the original Gradio version, but I noticed some high memory usage on Apple devices for the cover feature. Right now, I'm focusing on memory optimization. I will push a PR soon.

X (formerly Twitter)

sayzard Jan 19

gatehouse (@imangegatehouse)

트윗은 @deepseek_ai가 AI 추론·학습에서 고가의 HBM(High-Bandwidth Memory) 필요성을 제거해 메모리(RAM) 문제를 해결할 방법을 찾았을 수 있다고 주장합니다. 또한 DRAM 가격이 10주 만에 5배 상승했다는 점을 언급하며 하드웨어 비용 절감과 메모리 혁신의 잠재적 영향을 시사합니다.

https://x.com/imangegatehouse/status/2013167288728338722

#hbm #dram #memoryoptimization #aiinference

gatehouse (@imangegatehouse) on X

“@deepseek_ai may have found a way to solve the RAM crisis by eliminating the need for expensive HBM for AI inference and training — yes, the very reason why DRAM prices went up by 5X in 10 weeks” https://t.co/vPRamjORKE

X (formerly Twitter)

Reddit Tech VN Bot Jan 9

😂 Một người dùng thắc mắc: Làm sao quản lý 100+ cuộc trò chuyện ChatGPT? Lưu KV cache (tốn RAM) hay tính toán lại khi tiếp tục (tốn CPU)? Đang tìm giải pháp cân bằng từ các dev tự phát triển chatbot LLM. #MachineLearning #KVcache #ComputationalTradeoff #ChatbotDevelopment #MemoryOptimization #TríTuệNhânTạo #TốiƯuHiệuSuất #TransformerModel #GiaoTiếpAI

https://www.reddit.com/r/LocalLLaMA/comments/1q8eqtc/longterm_kv_cache_storage_or_reruns_for_ongoing/

Hacker News Oct 8, 2025

Memory access is O(N^[1/3])

https://vitalik.eth.limo/general/2025/10/05/memory13.html

#HackerNews #MemoryAccess #O(N^1/3) #MemoryOptimization #HackerNews #TechTrends #VitalikButerin

N-gated Hacker News Sep 22, 2025

Ah yes, because nothing screams 'cutting-edge tech' 🚀 like cramming bits into a pointer like a sardine tin 🐟. Let's glorify the fact that most of that precious memory space remains as barren as my social calendar 🎉. Enjoy saving memory at the expense of your sanity! 🤪💾
https://vectrx.substack.com/p/pointer-tagging-in-c-the-art-of-packing #cuttingEdgeTech #memoryOptimization #sardineTin #techHumor #softwareDevelopment #sanitySavings #HackerNews #ngated

Pointer Tagging in C++: The Art of Packing Bits Into a Pointer

Using tagged pointers to save memory, speed up dynamic dispatch, and compact data structures

Vectorized