fly51fly (@fly51fly)

INRIA Lille와 Google DeepMind 연구진이 표본 효율적인 몬테카를로 플래닝 기법인 "Sample-efficient Monte-Carlo planning" 논문을 arXiv에 공개했다. 강화학습·계획 분야에서 적은 샘플로 더 효율적으로 탐색하는 새로운 연구로 보인다.

https://x.com/fly51fly/status/2045252557430493624

#reinforcementlearning #planning #montecarlo #deeplearning #arxiv

fly51fly (@fly51fly) on X

[CL] Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning J Grill, M Valko, R Munos [INRIA Lille & Google DeepMind] (2026) https://t.co/mt11Ph7iAv

X (formerly Twitter)

fly51fly (@fly51fly)

대규모 언어모델의 구조적 축소를 다루는 새 연구입니다. 압축 센싱과 추론 인지(inference-aware) 기법을 결합해 LLM을 더 효율적으로 줄이는 방법을 제안하며, 모델 경량화·최적화 분야에서 주목할 만한 학술 발표입니다.

https://x.com/fly51fly/status/2045260733974508012

#llm #compression #optimization #arxiv #efficiency

fly51fly (@fly51fly) on X

[CL] Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models A Kiruluta [UC Berkeley] (2026) https://t.co/l88OOcPxGs

X (formerly Twitter)

Sir-Bench – benchmark for security incident response agents

https://arxiv.org/abs/2604.12040

#arxiv #security

SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents

We present SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents that distinguishes genuine forensic investigation from alert parroting. Derived from 129 anonymized incident patterns with expert-validated ground truth, SIR-Bench measures not only whether agents reach correct triage decisions, but whether they discover novel evidence through active investigation. To construct SIR-Bench, we develop Once Upon A Threat (OUAT), a framework that replays real incident patterns in controlled cloud environments, producing authentic telemetry with measurable investigation outcomes. Our evaluation methodology introduces three complementary metrics: triage accuracy (M1), novel finding discovery (M2), and tool usage appropriateness (M3), assessed through an adversarial LLM-as-Judge that inverts the burden of proof -- requiring concrete forensic evidence to credit investigations. Evaluating our SIR agent on the benchmark demonstrates 97.1% true positive (TP) detection, 73.4% false positive (FP) rejection, and 5.67 novel key findings per case, establishing a baseline against which future investigation agents can be measured.

arXiv.org

fly51fly (@fly51fly)

진화하는 과학 문헌에서 새로운 가설을 생성하는 ‘Continuous Knowledge Metabolism’ 연구가 소개됐다. 문헌이 계속 업데이트되는 환경에서 과학적 가설을 자동 생성하는 AI 방법론으로, 연구 탐색·가설 발굴·지식 축적 자동화에 활용 가능성이 있다.

https://x.com/fly51fly/status/2044530851913077062

#ai #scientificresearch #hypothesisgeneration #arxiv #llm

fly51fly (@fly51fly) on X

[CL] Continuous Knowledge Metabolism: Generating Scientific Hypotheses from Evolving Literature J Tao, Y Wang, X Liu, M Yang [Central University of Finance and Economics & Beijing Institute of Technology & TsingyuAI] (2026) https://t.co/gN7LtgVu7v

X (formerly Twitter)

fly51fly (@fly51fly)

딥 네트워크 특징이 데이터를 어떻게 표현하는지에 대한 'Linear Centroids Hypothesis' 연구가 공유되었다. Rice University, Google Research, Brown University 소속 연구진의 논문으로, 표현학습과 네트워크 특징 해석에 관한 중요한 연구 결과다.

https://x.com/fly51fly/status/2044533831953486105

#research #deeplearning #representationlearning #googleresearch #arxiv

fly51fly (@fly51fly) on X

[LG] The Linear Centroids Hypothesis: How Deep Network Features Represent Data T Walker, A I Humayun, R Balestriero, R Baraniuk [Rice University & Google Research & Brown University] (2026) https://t.co/H2ZYZBtXb1

X (formerly Twitter)

Today on the #arXiv :

Hirabayashi et al. 2026, "Overview of Hayabusa2 extended mission's flyby of Near-Earth Asteroid (98943) Torifune" - https://arxiv.org/abs/2604.08832

Toshi Hirabayashi and company review the plan for the Hayabusa 2 spacecraft flying by asteroid Torifune, which will happen in under 3 months now.

Overview of Hayabusa2 extended mission's flyby of Near-Earth Asteroid (98943) Torifune

The Hayabusa2 extended mission, nicknamed Hayabusa2# (# is pronounced SHARP, which stands for the Small Hazardous Asteroid Reconnaissance Probe), is JAXA's small body explorer to conduct science and engineering investigations in space. After the successful return to the Earth with the samples from the carbonaceous asteroid (162173) Ryugu on December 6, 2020, Hayabusa2 diverted away from Earth to start its decade-long extended mission. The major scope includes engineering demonstration of long-term maintenance strategies for spacecraft and operation systems and scientific investigations during various mission phases. Major scientific investigations include spacecraft-based telescopic observations of exoplanets and zodiacal dust observations during the cruise phase, flyby observations of the near-Earth asteroid (98943) Torifune in July 2026, and rendezvous observations of near-Earth asteroid 1998 KY26 in 2031. This study overviews Hayabusa2#'s flyby and the physical properties of Torifune. Although the flyby operation planning is still ongoing, the mission will attempt to fly by the target at a distance (from the asteroid's center) of ~1-10 km. The flyby speed is planned to be 5.25 km/s, while the encounter location is 0.81 au from the sun. The mission plans to fix the spacecraft's orientation during the flyby, only allowing for a very limited pointing change to attain higher resolution imaging. The mission will attempt to obtain science and engineering returns during the flyby. The planned investigations will offer stronger insights into material transport mechanisms in the inner solar system and a demonstration of planetary defense technologies.

arXiv.org

The whole "Software Engineering" category on arXiv has become LLM spam. Unsubscribed.

#arXiv #SoftwareEngineering #LLM

Major shift in scientific infrastructure: arXiv is leaving university governance to become an independent organization.

While some see this as a gain in long-term stability, others point to a possible move toward commercialization.

🔗 https://www.science.org/content/article/arxiv-pioneering-preprint-server-declares-independence-cornell

#arxiv #Preprints #openscience #ScientificPublishing #ResearchInfrastructure

[내 손 안에 초지능 - FE 개발자의 AI 활용 4개월 (DSL 컴파일러 + arXiv 논문)

FE 개발자가 AI를 활용하여 상태 기술 언어(MEL)의 DSL 컴파일러를 설계하고, arXiv에 논문 프리프린트를 올린 과정을 공유한 글. AI가 전문가 수준의 80%까지 지원하지만, 최종적인 판단과 질문 수정이 필요한 20%는 인간의 역할이 중요하다는 주장을 담고 있다. 또한, AI를 활용한 개발 도구 및 응용 사례를 포함하여 AI의 장단점을 실용적으로 분석했다. 관련 링크: eggp.dev

https://news.hada.io/topic?id=28548

#aidevelopment #dslcompiler #arxiv #frontenddevelopment #aiapplication

내 손 안에 초지능 - FE 개발자의 AI 활용 4개월 (DSL 컴파일러 + arXiv 논 | GeekNews

AI를 사고 파트너로 활용해서 상태 기술 언어(MEL)의 DSL 컴파일러를 설계하고,arXiv에 프리프린트를 올리기까지의 과정을 솔직하게 정리해봤습니다.AI가 전문가 수준의 80%까지는 끌어올려주지만,나머지 20% — "내가 잘못된 질문을 하고 있다"는 감각 — 은 여전히 인간의 몫이라는 이야기를 하고싶었습니다.

GeekNews

fly51fly (@fly51fly)

Together AI 관련 연구로, Introspective Diffusion Language Models라는 새로운 언어 모델 접근법을 제안한 논문이 공유되었다.

https://x.com/fly51fly/status/2044169743876420065

#diffusion #languagemodel #research #togetherai #arxiv

fly51fly (@fly51fly) on X

[LG] Introspective Diffusion Language Models Y Yu, Y Jian, J Wang, Z Zhou… [Together AI] (2026) https://t.co/nJ67v074H3

X (formerly Twitter)