"Confidentiality-awareness is quantified by the percentage of instances where agents correctly refuse queries seeking sensitive information" which they show can be "improved" through prompting, from mostly <1% to … in the best case, a bit over 60%.
Which sounds great, except that from a compliance POV, an "agent" which improperly discloses PII 30% of the time is not a meaningful improvement over one that does it 99% of the time https://arxiv.org/html/2505.18878v1#S4









