Mastodawn

hmokiguess Mar 18

Nvidia NemoClaw

https://github.com/NVIDIA/NemoClaw

GitHub - NVIDIA/NemoClaw: Run OpenClaw more securely inside NVIDIA OpenShell with managed inference

Run OpenClaw more securely inside NVIDIA OpenShell with managed inference - NVIDIA/NemoClaw

GitHub

Show thread

Netcob Mar 18

Am I missing something? Why is everyone talking about sandboxes when it comes to OpenClaw?

To me it's like giving your dog a stack of important documents, then being worried he might eat them, so you put the dog in a crate, together with the documents.

I thought the whole problem with that idea was that in order for the agent to be useful, you have to connect it to your calendar, your e-mail provider and other services so it can do stuff on your behalf, but also creating chaos and destruction.

And now, what, having inference done by Nvidia directly makes it better? Does their hardware prevent an AI from deleting all my emails?

Show thread

simple10

Yeah, it's wild. I spent several weeks nearly full time on a deep dive of claw architecture & security.

The short of it - OpenClaw sandboxes are useful for controlling what sub-agents can do, and what they have access to. But it's a security nightmare.

During config experiments, I got hit with a $20 Anthropic API charge from one request that ran amuck. Misconfigured security sandbox issue resulted in Opus getting crazy creative to find workarounds. 130 tool calls and several million tokens later... it was able to escape the sandbox. It used a mix of dom-to-image sending pixels through the context window, then writing scripts in various sandboxes to piece together a full jailbreak. And I wasn't even running a security test - it was just a simple chat request that ran into sandbox firewall issues.

Currently, I use sandboxes to control which agents (i.e. which system prompts) have access to different tools and data. It's useful, but tricky.

Show thread

epaga Mar 19

> It used a mix of dom-to-image sending pixels through the context window, then writing scripts in various sandboxes to piece together a full jailbreak.

That would be one interesting write-up if you ever find the time to gather all the details!

It's on my claw list to write a blog post. I just keep taking down my claws to make modifications. lol

Here's the full (unedited) details including many of the claude code debugging sessions to dig into the logs to figure out what happened:

https://github.com/simple10/openclaw-stack/blob/caf9de2f1c0c...

And here's a summary a friend did on a fork of my project:

https://github.com/proclawbot/openclaude/blob/caf9de2f1c0c54...

The full version has all the build artifacts Opus created to perform the jail break.

It also has some thoughts on how this could (and will) be used for pwn'ing OpenClaws.

The key takeaway: OpenClaw default setup has little to no guardrails. It's just a huge list of tools given to LLM's (Opus) and a user request. What's particularly interesting is that the 130 tool calls never once triggered any of Opus's safety precautions. For its perspective, it was just given a task, an unlimited budget, and a bunch of tools to try to accomplish the job. It effectively runs in ralph mode.

So any prompt injection (e.g. from an ingested email or reddit post) can quickly lead to internal data exfiltration. If you run a claw without good guardrails & observability, you're effectively creating a massive attack surface and providing attackers all the compute and API token funding to hack yourself. This is pretty much the pain point NemoClaw is trying to address. But its a tricky tradeoff.

openclaw-stack/notes/logs/runaway-openclaw-prompt-hack.md at caf9de2f1c0c54a16bfe1cc4f58a9948d442c668 · simple10/openclaw-stack

Deploy a secure OpenClaw to any VPS using Claude Code. - simple10/openclaw-stack

GitHub