Arcee Trinity Large Technical Report
Arcee Trinity Large는 4000억 개의 파라미터를 가진 희소 Mixture-of-Experts(MoE) 모델로, 토큰당 130억 개의 활성화 파라미터를 사용한다. 이와 함께 Trinity Nano(60억 파라미터)와 Trinity Mini(260억 파라미터) 모델도 소개되었으며, 모두 최신 아키텍처와 새로운 MoE 부하 균형 전략인 SMEBU를 적용했다. 모델들은 Muon 옵티마이저로 훈련되었고, 대규모 토큰 데이터셋(최대 170억 토큰)으로 사전학습되었다. 이 기술 보고서는 대규모 희소 모델 설계와 훈련에 중요한 참고자료가 될 전망이다.

https://arxiv.org/abs/2602.17004

#machinelearning #mixtureofexperts #largescale #transformers #optimization

Arcee Trinity Large Technical Report

We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token, Trinity Mini having 26B total parameters with 3B activated per token. The models' modern architecture includes interleaved local and global attention, gated attention, depth-scaled sandwich norm, and sigmoid routing for Mixture-of-Experts. For Trinity Large, we also introduce a new MoE load balancing strategy titled Soft-clamped Momentum Expert Bias Updates (SMEBU). We train the models using the Muon optimizer. All three models completed training with zero loss spikes. Trinity Nano and Trinity Mini were pre-trained on 10 trillion tokens, and Trinity Large was pre-trained on 17 trillion tokens. The model checkpoints are available at https://huggingface.co/arcee-ai.

arXiv.org
Schools Send Out Newsletters Regarding Safety Ahead Of BTS’s Large-Scale Comeback Stage - KpopNewsHub – Latest K-Pop News, Idols & Korean Entertainment

They are on another level.

Kpop News Hub
BTS's World Tour Signifies A Large-Scale Issue K-Pop Is Facing - KpopNewsHub – Latest K-Pop News, Idols & Korean Entertainment

Finally making a group comeback following a hiatus due to military, BTS announced a tour that will take them around the globe. However, experts have taken

Kpop News Hub
Beijing has announced extensive military exercises encircling Taiwan, escalating tensions in the region and drawing condemnation from Taipei. The maneuvers, dub... https://news.osna.fm/?p=28800 | #news #amid #china #drills #largescale
China Launches Large-Scale Drills Near Taiwan Amid Rising Tensions - Osna.FM

China's massive military drills encircling Taiwan escalate tensions - breaking news and analysis of the escalating situation.

Osna.FM

The latest #LargeScale experiment of #testosterone manipulation fail to replicate previous findings [typically in smaller-scale studies] on #economic #preferences in men, calling into question the idea that short-term testosterone fluctuations are important drivers of men’s economic preferences.

Investigating the effects of single-dose intranasal testosterone on economic preferences in a large #RandomizedTrial of men

@PNASNews
#OpenAccess

https://www.pnas.org/doi/10.1073/pnas.2508519122

Tuesday, May 27, 2025

Secret note reveals Russia using Telegram bots to control drones attacking Ukraine — Russia can attack Europe 2-4 years after war’s end, faster with lifted sanctions — ‘Russia is not winning this war,’ EU defense commissioner says — Latvia urges EU-wide halt to Russian visas over security concerns — Why did Russia invade Ukraine? Debunking the Kremlin’s ‘root causes’ claims… and more

https://activitypub.writeworks.uk/2025/05/tuesday-may-27-2025/

Monday, May 26, 2025

China supplying Russian military factories with chemicals, gunpowder, components — Russia ‘categorically’ rejected unconditional ceasefire in peace talks — Russians mock US and peace process with latest attacks on Ukraine, EU ambassador says — Putin is not interested in peace; German FM calls for additional sanctions following large-scale Russian attack on Ukraine

https://activitypub.writeworks.uk/2025/05/monday-may-26-2025/

Piccolo: Large-Scale Graph Processing with Fine-Grained In-Memory Scatter-Gather

https://arxiv.org/abs/2503.05116

#HackerNews #Piccolo #Graph #Processing #In-Memory #ScatterGather #LargeScale #Computing

Piccolo: Large-Scale Graph Processing with Fine-Grained In-Memory Scatter-Gathe

Graph processing requires irregular, fine-grained random access patterns incompatible with contemporary off-chip memory architecture, leading to inefficient data access. This inefficiency makes graph processing an extremely memory-bound application. Because of this, existing graph processing accelerators typically employ a graph tiling-based or processing-in-memory (PIM) approach to relieve the memory bottleneck. In the tiling-based approach, a graph is split into chunks that fit within the on-chip cache to maximize data reuse. In the PIM approach, arithmetic units are placed within memory to perform operations such as reduction or atomic addition. However, both approaches have several limitations, especially when implemented on current memory standards (i.e., DDR). Because the access granularity provided by DDR is much larger than that of the graph vertex property data, much of the bandwidth and cache capacity are wasted. PIM is meant to alleviate such issues, but it is difficult to use in conjunction with the tiling-based approach, resulting in a significant disadvantage. Furthermore, placing arithmetic units inside a memory chip is expensive, thereby supporting multiple types of operation is thought to be impractical. To address the above limitations, we present Piccolo, an end-to-end efficient graph processing accelerator with fine-grained in-memory random scatter-gather. Instead of placing expensive arithmetic units in off-chip memory, Piccolo focuses on reducing the off-chip traffic with non-arithmetic function-in-memory of random scatter-gather. To fully benefit from in-memory scatter-gather, Piccolo redesigns the cache and MHA of the accelerator such that it can enjoy both the advantage of tiling and in-memory operations. Piccolo achieves a maximum speedup of 3.28$\times$ and a geometric mean speedup of 1.62$\times$ across various and extensive benchmarks.

arXiv.org