Countries

The latest news and headlines, featuring real time updates for countries, cities, states, politics, economy, sports, food, culture via Ken's Blogspot

Another reason why I prefer b&w #photography? Both are calibrated(!) #BenQ monitors (albeit different models). According to the measuring device, both are supposedly perfectly calibrated and yet they look that bloody different ... 🤯. Oh, ffs!

#calibration #i1Profiler #Xrite

Another reason why I prefer b&w #photography? Both are calibrated(!) #BenQ monitors (albeit different models). According to the measuring device, both are supposedly perfectly calibrated and yet they look that bloody different ... 🤯. Oh, ffs! #calibration #i1Profiler #Xrite
jwst 2.0.1
https://atlas.whatip.xyz/post.php?slug=jwst-201
<p>Library for calibration of science observations from the James Webb Space Telescope</p>
#observations #calibration #library #science
jwst 2.0.1

Library for calibration of science observations from the James Webb Space Telescope

How do we make LLM output more trustworthy? A short survey note on three lines of recent work covering five papers: conformal-prediction coverage guarantees, behavioral calibration of the model's prose, and sample-disagreement detection. All three pay the same multi-sample inference tax; the choice is about what you want back.

https://benjaminhan.net/posts/20260505-llm-uncertainty-survey/?utm_source=mastodon&utm_medium=social

#Hallucination #LLMs #Calibration #ConformalPrediction #AI

How to Make LLM Output More Trustworthy – synesis

A short survey of three approaches for mitigating hallucination in large language models: formal coverage guarantees via conformal prediction, behavioral calibration of the model’s prose, and post-hoc detection of unreliable outputs.

synesis

Semantic Entropy (Nature 2024) detects LLM confabulations by clustering sampled answers by meaning and computing entropy over the cluster distribution. "Paris" and "It's Paris" cluster together, so paraphrase noise doesn't inflate the signal. Cost: it only catches hallucinations that vary across samples. If the model is consistently wrong, all samples cluster and the detector says "confident".

https://benjaminhan.net/posts/20260505-semantic-entropy/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #Nature #AI

Detecting Hallucinations in Large Language Models Using Semantic Entropy – synesis

A Nature 2024 method for detecting a subset of LLM hallucinations — confabulations — by computing entropy over the meaning of sampled answers, not the surface token sequence.

synesis

Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

https://benjaminhan.net/posts/20260505-linguistic-calibration/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #ICML #AI

Linguistic Calibration of Long-Form Generations – synesis

A two-stage recipe (summary-distillation SFT followed by decision-based RL) trains Llama 2 7B to emit long-form text whose confidence phrases let readers make calibrated probabilistic forecasts about downstream questions.

synesis

Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

https://benjaminhan.net/posts/20260505-conformal-factuality/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

Language Models with Conformal Factuality Guarantees – synesis

A framework that turns a correctness guarantee for LM outputs into a conformal prediction problem, backing off to less specific claims until the error rate crosses a target threshold.

synesis

A primer on conformal prediction: the recipe for distribution-free coverage guarantees that doesn't require your model to be calibrated. Rank-based non-conformity scores plus a calibration quantile give you valid prediction sets. Easy inputs get one-class sets; hard ones get many alternatives. Set size is where the uncertainty shows up.

https://benjaminhan.net/posts/20260505-conformal-prediction-primer/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #UncertaintyQuantification #AI

A Primer on Conformal Prediction – synesis

Distribution-free coverage guarantees via a calibration quantile, how set size encodes uncertainty, and why rank-based scoring gets you validity without model calibration.

synesis

Scale buys calibration in exactly the format you tested in. Lettered MC works at 52B; swap one option for "none of the above" and it collapses; True/False rephrasing restores it. RLHF policies need a temperature fix. "Do I know this?" heads miscalibrate out of distribution. Three years on, SelfReflect lands the same conclusion: only sampling your own answers and summarizing gets an LLM to describe its own distribution.

https://benjaminhan.net/posts/20260504-kadavath-mostly-know/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #AISafety #Anthropic #AI

Language Models (Mostly) Know What They Know – synesis

Large language models can self-evaluate whether their own samples are correct, and can be trained to predict whether they know an answer before giving one.

synesis