fly51fly (@fly51fly)

NVIDIA와 UC Berkeley가 저비용으로 높은 정확도의 에이전트형 포스트 트레이닝을 구현하는 PivotRL을 공개했습니다. 적은 연산 비용으로도 에이전트 성능을 높일 수 있는 강화학습 기반 후학습 방법으로, 실용적인 LLM 에이전트 개발에 유용한 연구입니다.

https://x.com/fly51fly/status/2036560264972345392

#pivoutrl #agentic #posttraining #reinforcementlearning #nvidia

fly51fly (@fly51fly) on X

[LG] PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost J Yi, D Mosk-Aoyama, B Huang, R Gala… [NVIDIA & UC Berkeley] (2026) https://t.co/GjdsQOd3AO

X (formerly Twitter)
#LLMs learn various #characterarchetypes during #pretraining. #Posttraining focuses on the “#Assistant#persona, but its stability is uncertain. Researchers mapped a “persona space” for LLMs, finding the “#AssistantAxis” aligns with helpful, professional archetypes. Monitoring and capping activations along this axis can prevent models from drifting into harmful personas, enhancing their stability and safety. https://www.anthropic.com/research/assistant-axis?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI
The assistant axis: situating and stabilizing the character of large language models

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Làm thế nào để bắt đầu với hậu đào tạo / tinh chỉnh của LLMs? Các chuyên gia đang chia sẻ kinh nghiệm về quá trình tinh chỉnh mô hình Hermes với hiệu suất tuyệt vời #LLMs #Hermes #FineTuning #PostTraining #AI #TríTuệNhânTạo #ĐàoTạoMáyTính

https://www.reddit.com/r/LocalLLaMA/comments/1p22zew/advice_for_getting_into_posttraining_finetuning/

What should you do if your academic publishers asks you to license a monograph for AI training?

A few people have asked my advice on this recently so I’m sharing here in case it’s useful:

  • Check if models have been trained on your monographs here.
  • If your work has already been used for training, it’s unlikely it will ever be removed from models. Therefore you’re effectively receiving some (inadequate) compensation for the theft of your intellectual property.
  • If your work hasn’t been used for training, it’s a case of weighing up the advantages against the disadvantages. Training on your work means you might be more likely to be visible within the model (i.e. more likely to be invoked in response to a prompt about your domain) but this is a deeply unpredictable matter. Conversely it means your work might be diffused in a way that means your intellectual labour is chopped up and repackaged without any link to you.
  • So it’s a case of considering how much you value the potential visibility, which I would argue is non-trivial, against how much the potential severing of the link between your ideas and your authorship bothers you.

If it helps, I agonised about this in my role as a literary executor (cared much less about my own work) and reached the conclusion that diffusion of the ideas is best served by being incorporated into training. I wouldn’t expect everyone to reach the same conclusion but I hope it’s useful to make these suggestions about factors to consider.

#intellectualProperty #LLMs #postTraining #publishing #scholarlyPublishing #Training #visibility

Search LibGen, the Pirated-Books Database That Meta Used to Train AI

Millions of books and scientific papers are captured in the collection’s current iteration.

The Atlantic