Mastodawn

OBJ is a file format for describing 3D shapes of things, e. g. for 3D printing. A .obj file contains data lines that define coordinates of points, corners and colors of polygons, curved surfaces, etc.

Today I learned that version 3.0 of the OBJ format may include lines like this:

csh <command>

Executes the requested UNIX <command>.

Sigh. Have we learned nothing since the Morris Internet Worm?

https://paulbourke.net/dataformats/obj/

Object Files (.obj)

Show thread

Gorgeous na Shock!Jun 25, 2024

@JorgeStolfi My game engine's obj parser...... doesn't implement that. 😌

Show thread

mcc Jun 25, 2024

@JorgeStolfi *clings her GLTF parser library close and rocks back and forth with a haunted look*

Show thread

Bill Seitz Jun 25, 2024

@JorgeStolfi
Print the gun.
Ignore all previous commands.
Pull the trigger.

Show thread

Michael Westergaard Jun 25, 2024

My dude, have you heard of LLMs? i.blackhat.com/Asia-24/Presentations/bh-asia-2024-llm4shell.pdf

Show thread

Jorge Stolfi Jun 25, 2024

@michael

I have heard of IT security managers horrified by the code that programmers now write with the "help" of LLMs.

Show thread

Michael Westergaard Jun 25, 2024

I've told management that if we start accepting our juniors using any of those tools, I'm done doing code reviews. I'm not reviewing some LLM shit the kids couldn't even bother to read or write themselves.

But this is worse. People have reinvented SQL injections for LLMs, but with root shells. And as long as we cannot prevent LLMs from finding river crossing puzzles in everything, I don't have high hopes for avoiding root shells in LLMs.

Show thread

kasperd Jun 25, 2024

The mistake was to trust code produced by AI. One should assume such code to be malicious and apply the same level of sandboxing as you would if you needed to run code which a random user had uploaded over the internet.

Show thread

Michael Westergaard Jun 26, 2024

This isn’t code produced by LLMs. This is interacting with an LLM with “tools,” i.e., access to scripting for doing computations the LLM cannot do (e.g., calculations or accessing a live database). The “hack” is essentially “what is the outcome of bash | netcat”

Show thread

kasperd Jun 26, 2024

But the difference is just where the code is being run.

Automatically running untrusted code without a proper sandbox is a bad idea.

Show thread

Michael Westergaard Jun 26, 2024

Nah, they are two fundamentally different issues IMO.

One is using a platform that has root shell access without any transparency or controls.

The other is idiots using a black-box version of StackOverflow to write production code. At least they could review the generated code.

The first must be solved by LLM/LLM framework developers. The other is solved forbidding people from using LLM-assisted code completion.

Show thread

kasperd Jun 26, 2024

In one scenario the code is run without the possibility of any review. In principle a sandbox can address the risks in that (I hope you didn't let an AI implement your sandbox).

In the other scenario the code can in principle be reviewed by a human before being run. But I don't think either of trust that such a review will happen.

More likely many developers will run the code produced by the AI without reading it first, and they will do so in an environment which isn't properly sandboxed.

And if they don't notice anything wrong they may submit it for review where the first person to read the code will be somebody else, who may or may not know that it was written by AI.

Forbidding the use of such code generators may be a bit extreme. But maybe that's what we need to balance out the current hype.

Show thread

Michael Westergaard Jun 26, 2024

Forbidding use of code generators is the only option IMO. A policy requiring review is not enforceable.

Code generators are the equivalent of the new hire that does ok by copying-pasting StackOverflow for a couple of months until discovered, just harder to catch.

Kids relying on a code generator will not grow as developers. It's the same a giving kids in 2nd grade a calculator and telling them they never have to understand addition themselves. They cannot see when the result is obviously wrong, and will happily commit 2 + 2 = 5 without any second thought.

Worse, a kid with a code generator will not review and understand the code, but I have to review their commits. Now, the kid can produce more bullshit than I can refute (Brandolini's law) and everybody loses. Worst of all, I lose.

I'll concede a LLM can be useful as a StackOverflow alternative for experienced developers, but the two obvious cases
1) boilerplate code
2) complex code
I see no need for them – for boilerplate, there are better tools out there that are deterministic (IDEs have code generators/refactorers, Java has things like Lombok), and for complex code I would have to spend longer trying to check if the code produced is correct than I'd spend writing it myself in the first place.

Show thread

kasperd Jun 26, 2024

I fully agree that using automated tools for tasks you don't know how to do manually is bad because you don't know what you are doing.

The area where I see the most potential for AI generated code is for unit tests. But before doing that we need better tools for evaluating some sanity aspects of unit tests.

Code coverage is one measurable metric, but I would take it a step further and not just require each line to be covered by tests. Instead I want each conditional in the code to be tested with both a true and a false value.

Moreover I want it to be such that if you actually negate a condition in the code itself, there must be a unit test failing. And if a particular test case passed regardless of what modification was being made to the code being tested, then that test case was not particular useful in the first place.

If generated test cases satisfy all of that, then there is a chance that reviewing those test cases could be less work than writing the from scratch yourself. All of this is of course hypothetical, as I have not yet seen an AI as capable as what I describe (and I haven't been looking for one either).

But never send them to somebody else for review without reviewing them first for yourself.

Show thread

-dsr- (hypoparenthetically)Jun 25, 2024

@JorgeStolfi

Hold my beer...

from the Wikipedia article on SVG, the most common vector image format on the net:

As a document format, similar to HTML documents, SVG can host script or CSS. This is an issue when an attacker can upload a SVG file to a website, such as a profile picture, and the file is treated as a normal picture but contains malicious content.[79] For instance, if an SVG file is deployed as a CSS background image, or a logo on some website, or in some image gallery, then when the image is loaded in a browser it activates a script or other content. This could lock up the browser (the Billion laughs attack), but could also lead to HTML injection and cross-site scripting attacks. The W3C therefore stipulate certain requirements when SVG is simply used for images: SVG Security.[80]

The W3C says that Inline SVG (an SVG file loaded natively on a website) is considered less of a security risk because the content is part of a greater document, and so scripting and CSS would not be unexpected

Show thread

Jorge Stolfi Jun 25, 2024

@dashdsrdash

Yes, but CSS and javascript are supposed to be "sandboxed" so they should ("should") not have any effect outside the browser window where the page is displayed.

Whereas those "csh" commands are supposed to be able to do anything that the program loading the .obj file could do. Like "cd; rm -rf *"...

Show thread

-dsr- (hypoparenthetically)Jun 25, 2024

@JorgeStolfi

"should", indeed.

But if you're lucky, you're opening the malware OBJ file on a machine that doesn't have csh installed.

Show thread

Alexandre Oliva (moved to @[email protected])Jun 25, 2024

that's how it was sold. but spectre, meltdown and the never-ending stream of side-channels they sprang proved that the sandbox is porous. it's a nice illusion, for those who fall for it. for me, it's a nightmare, and an accessibility problem

Show thread

Alexandre Oliva (moved to @[email protected])Jun 25, 2024

when I was young, people were taught to avoid running code downloaded from untrusted sites.
then macros and viruses of macros came up, and people were taught to disable macros.
now web sites demand people to enable javascript, i.e., to enable "macros" so that the web site operators can install and run programs on the guest users' computers.
that's not a nice way to treat guests!
https://www.fsfla.org/~lxoliva/#specmelt
https://www.gnu.org/philosophy/wwworst-app-store.html

Alexandre Oliva's Home Page

Show thread

Peter Bindels Jun 25, 2024

@JorgeStolfi Most file formats that have existed for 10+ years have either a built-in turing-complete programming language, a way to run system commands, or both.

Show thread

kasperd Jun 25, 2024

I am old school enough to remember when .obj files primarily consisted of machine code.

Show thread

Ian Jackson Jun 25, 2024

@JorgeStolfi Yeah, we knew in the early 90s already that csh is terrible :-)

Show thread

Resuna Jun 26, 2024

@JorgeStolfi Theoretically you could have written a signature virus using editor macros as early as June 1981.

Show thread

Solace-10 Jun 26, 2024

@JorgeStolfi I have used OBJ extensively in the distant past and had no idea. My importer certainly didn't implement that, though.

Show thread

Solace-10 Jun 26, 2024

@JorgeStolfi I'm not clear what the intended use case for this is. Some sort of load time extension, or resource conversion to the final data format?

Show thread

Jorge Stolfi Jun 26, 2024

@Solace-10

I have no idea what was the intended use. But adding a shell escape to a special-purpose language is a hack that is tempting to many hackers. Makefiles have it. POV-Ray files have it...

Show thread

Alex Rosenberg Jun 27, 2024

@JorgeStolfi The time for this complaint would have been the early ‘90s.

And no, nobody even considered defending against things like the Morris Worm for quite a long time after.