I've asked Claude to implement a Rust port of JS library given source code in another directory: there's already one such implementation in OSS, mine. Result?

AI blatantly plagiarised my OSS code, including parts that were not present in source it was pointed to port.

Upon prohibiting it to touch outside implementations and focus on translating local directory... it ignored command and plagiarised my work again. It did the same with C# port and one existing impl.

It's sort of eye-opening experience, as person very familiar with the plagiarised source you can see how LLM is stitching together fragments of code seen somewhere else. It's just that most of the time we don't know the original code that AI reused and cannot notice the stitches, so we consider it to be an original writing.

gl;hf to all the people using AI to write code in a domain where GPL source was available.

@horusiath
> It's sort of eye-opening experience, as person very familiar with the plagiarised source you can see how LLM is stitching together fragments of code seen somewhere else

Hey @conservancy, here's smoking gun evidence of the risk of generated code violating copyleft license conditions, which you talk about here;

https://giveupgithub.com/

Give Up GitHub - Software Freedom Conservancy

The Software Freedom Conservancy provides a non-profit home and services to Free, Libre and Open Source Software (FLOSS) projects.

@horusiath Their fundamental algorithm is to reproduce characteristics of the text they're trained on. That means writing words (which can be the same or a synonym) in the same order. In formal languages such as programming languages there are not many synonyms (that's the point: be concise and unambiguous). Dependencies across code blocks quickly constraint the possible word chainings.

So… basically they can only be a sophisticated code retrieval system.

@Fedihacker IMO you underestimate on how many ways you could solve the same problem.

Besides I'm talking about:
- Porting types that were not present in source, but existed in past versions & my code.
- Porting names that didn't exist in source, but exist in my implementation.
- Using highly un-idiomatic design choices, not present in source. Tbh. I haven't found them anywhere outside my lib.

It's way too specific to be considered "anyone would write it this way".

@horusiath
fun fact: this is not only a problem with GPL, but almost all code licenses (except for public-domain-likes) require you to keep the license/copyright header *at least in the code*, often also in documentation shipped with binaries (e.g. MIT license)
@horusiath Generative AI should be called Derivative AI. It's a copyright/license washing machine for images, code, books...

@horusiath

Yours was the only Markoff chain it had available to tug on.

@horusiath Using LLMs is a sin and your peers should laugh at you whenever they see you.
@horusiath curious, what license does your plagiarised project use? That's important.
@ralf My code uses MIT for the scope AI has replicated. But that doesn't matter: pretty much every license (MIT included) requires attribution.