Mastodawn

Jorge Stolfi Jun 25, 2024

OBJ is a file format for describing 3D shapes of things, e. g. for 3D printing. A .obj file contains data lines that define coordinates of points, corners and colors of polygons, curved surfaces, etc.

Today I learned that version 3.0 of the OBJ format may include lines like this:

csh <command>

Executes the requested UNIX <command>.

Sigh. Have we learned nothing since the Morris Internet Worm?

https://paulbourke.net/dataformats/obj/

Object Files (.obj)

Show thread

Michael Westergaard

My dude, have you heard of LLMs? i.blackhat.com/Asia-24/Presentations/bh-asia-2024-llm4shell.pdf

Show thread

Jorge Stolfi Jun 25, 2024

@michael

I have heard of IT security managers horrified by the code that programmers now write with the "help" of LLMs.

Show thread

Michael Westergaard Jun 25, 2024

I've told management that if we start accepting our juniors using any of those tools, I'm done doing code reviews. I'm not reviewing some LLM shit the kids couldn't even bother to read or write themselves.

But this is worse. People have reinvented SQL injections for LLMs, but with root shells. And as long as we cannot prevent LLMs from finding river crossing puzzles in everything, I don't have high hopes for avoiding root shells in LLMs.

Show thread

kasperd Jun 25, 2024

The mistake was to trust code produced by AI. One should assume such code to be malicious and apply the same level of sandboxing as you would if you needed to run code which a random user had uploaded over the internet.

Show thread

Michael Westergaard Jun 26, 2024

This isn’t code produced by LLMs. This is interacting with an LLM with “tools,” i.e., access to scripting for doing computations the LLM cannot do (e.g., calculations or accessing a live database). The “hack” is essentially “what is the outcome of bash | netcat”

Show thread

kasperd Jun 26, 2024

But the difference is just where the code is being run.

Automatically running untrusted code without a proper sandbox is a bad idea.

Show thread

Michael Westergaard Jun 26, 2024

Nah, they are two fundamentally different issues IMO.

One is using a platform that has root shell access without any transparency or controls.

The other is idiots using a black-box version of StackOverflow to write production code. At least they could review the generated code.

The first must be solved by LLM/LLM framework developers. The other is solved forbidding people from using LLM-assisted code completion.

Show thread

kasperd Jun 26, 2024

In one scenario the code is run without the possibility of any review. In principle a sandbox can address the risks in that (I hope you didn't let an AI implement your sandbox).

In the other scenario the code can in principle be reviewed by a human before being run. But I don't think either of trust that such a review will happen.

More likely many developers will run the code produced by the AI without reading it first, and they will do so in an environment which isn't properly sandboxed.

And if they don't notice anything wrong they may submit it for review where the first person to read the code will be somebody else, who may or may not know that it was written by AI.

Forbidding the use of such code generators may be a bit extreme. But maybe that's what we need to balance out the current hype.

Show thread

Michael Westergaard Jun 26, 2024

Forbidding use of code generators is the only option IMO. A policy requiring review is not enforceable.

Code generators are the equivalent of the new hire that does ok by copying-pasting StackOverflow for a couple of months until discovered, just harder to catch.

Kids relying on a code generator will not grow as developers. It's the same a giving kids in 2nd grade a calculator and telling them they never have to understand addition themselves. They cannot see when the result is obviously wrong, and will happily commit 2 + 2 = 5 without any second thought.

Worse, a kid with a code generator will not review and understand the code, but I have to review their commits. Now, the kid can produce more bullshit than I can refute (Brandolini's law) and everybody loses. Worst of all, I lose.

I'll concede a LLM can be useful as a StackOverflow alternative for experienced developers, but the two obvious cases
1) boilerplate code
2) complex code
I see no need for them – for boilerplate, there are better tools out there that are deterministic (IDEs have code generators/refactorers, Java has things like Lombok), and for complex code I would have to spend longer trying to check if the code produced is correct than I'd spend writing it myself in the first place.

Show thread

kasperd Jun 26, 2024

I fully agree that using automated tools for tasks you don't know how to do manually is bad because you don't know what you are doing.

The area where I see the most potential for AI generated code is for unit tests. But before doing that we need better tools for evaluating some sanity aspects of unit tests.

Code coverage is one measurable metric, but I would take it a step further and not just require each line to be covered by tests. Instead I want each conditional in the code to be tested with both a true and a false value.

Moreover I want it to be such that if you actually negate a condition in the code itself, there must be a unit test failing. And if a particular test case passed regardless of what modification was being made to the code being tested, then that test case was not particular useful in the first place.

If generated test cases satisfy all of that, then there is a chance that reviewing those test cases could be less work than writing the from scratch yourself. All of this is of course hypothetical, as I have not yet seen an AI as capable as what I describe (and I haven't been looking for one either).

But never send them to somebody else for review without reviewing them first for yourself.