Đang thử nghiệm một số mô hình ngôn ngữ, đặc biệt là dịch thuật. Có ai dùng qua gpt-oss cho dịch đa ngôn ngữ, cụ thể là tiếng châu Âu và tiếng Nhật chưa? Đã thử Mistral Small và Gemma 3, thấy ổn. Gpt-oss so ra sao? Thiếu tiêu chuẩn đánh giá khiến việc lựa chọn mô hình khó khăn. Ai có kinh nghiệm chia sẻ giúp! #AI #dịchthuật #Mistral #Gemma #AItranslation #language_models #gpt_oss

(NOTE: Post is in Vietnamese, under 500字符, includes both English & Vietnamese tags, no URLs. Original content is a

All LLMs in One Place | LLM OneStop

Access ChatGPT, Claude, Gemini, and more AI models from one unified platform. Switch between LLMs mid-conversation.

LLM OneStop
🤡 Scientists have discovered that narrowly finetuning large language models can lead to hilariously misaligned results 🤯. Who knew that stretching a rubber band in one place would make the whole thing snap? 🙄 Bravo to the geniuses who spend years fine-tuning #chaos. 👏
https://arxiv.org/abs/2502.17424 #scientificdiscovery #humor #language_models #misalignment #fine_tuning #HackerNews #ngated
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding. It asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment. In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger. It's important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.

arXiv.org
"🧐 Researchers bravely attempt to 'liberate' snippets from books using language models, ignoring copyright like it's an optional suggestion. 📚🤖 Meanwhile, #arXiv is casually looking to hire a #DevOps engineer, because who doesn't want to work for a glorified PDF repository? 💻🎉"
https://arxiv.org/abs/2505.12546 #liberationofknowledge #copyrightissues #language_models #hiring #HackerNews #ngated
Extracting memorized pieces of (copyrighted) books from open-weight language models

Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs' protected expression. Drawing on adversarial ML and copyright law, we show that these polarized positions dramatically oversimplify the relationship between memorization and copyright. To do so, we leverage a recent probabilistic extraction technique to extract pieces of the Books3 dataset from 13 open-weight LLMs. Through numerous experiments, we show that it's possible to extract substantial parts of at least some books from different LLMs. This is evidence that the LLMs have memorized the extracted text; this memorized content is copied inside the model parameters. But the results are complicated: the extent of memorization varies both by model and by book. With our specific experiments, we find that the largest LLMs don't memorize most books -- either in whole or in part. However, we also find that Llama 3.1 70B memorizes some books, like Harry Potter and 1984, almost entirely. We discuss why our results have significant implications for copyright cases, though not ones that unambiguously favor either side.

arXiv.org
Block Diffusion: Interpolating Autoregressive and Diffusion Language Models
https://m-arriola.com/bd3lms/
#ycombinator #block_diffusion #discrete #masked #diffusion #language_models #BD3_LM #BD3_LMs
SOCIAL MEDIA TITLE TAG

SOCIAL MEDIA DESCRIPTION TAG TAG

Ah, the riveting world of "circuit tracing" in language models 🤖🔍, because what we really needed was another way to complicate things we barely understand. A "replacement model" that makes things "interpretable"? 😂 More like a desperate attempt to justify endless AI research grants.
https://transformer-circuits.pub/2025/attribution-graphs/methods.html #circuittracing #AIinterpretability #researchgrants #language_models #techhumor #HackerNews #ngated
Circuit Tracing: Revealing Computational Graphs in Language Models

We describe an approach to tracing the “step-by-step” computation involved when a model responds to a single prompt.

Transformer Circuits
DeepSeek's R1 AI Model Faces Criticism Over Security Vulnerabilities - RedPacket Security

DeepSeek's R1, the latest large language model (LLM) developed by this Chinese startup, is currently facing significant criticism due to various security

RedPacket Security
History of AI Reasoning (AlphaGo, MuZero, LLMs)

YouTube
Academics Develop Testing Benchmark for LLMs in Cyber Threat Intelligence - RedPacket Security

Large language models (LLMs) are increasingly used for cyber defense applications, although concerns about their reliability and accuracy remain a significant

RedPacket Security