New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload
TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%, compute cost +2.8%, statistically significant. No vuln, no volumetric signature, evades all standard detection.
Worst case: exposed unauthenticated Ollama instance with num_predict=4096 + keep_alive=300s → Amplification Factor 17.56 Wh/KB. 3KB of attacker bandwidth → enough energy to charge a phone 5%.
Especially nasty for:
- PA/judicial chatbots on fixed budgets
- Pay-per-use API deployments with client-side exposed keys
- PNRR-funded public sector AI with zero inference monitoring
Four scenarios: EDoS, browser JS distribution, Ollama open-proxy relay, frontier providers as involuntary relays.
All tests on self-hosted Ollama, no commercial endpoints touched.
Paper (CC BY 4.0): https://doi.org/10.13140/RG.2.2.26767.96166
#llmsecurity #infosec #threatmodeling #ollama #ood #AI #AIResearch #aisecurity




