So I just learned what "The Stack" is today: an aggregation of GitHub repos for machine learning from which I can opt out.

But I won't.

I won't because they scraped some hot garbage I wrote in bash and Python that would make you faint. Bottom-of-the-barrel throw-away scripts full of coding crimes. Stuff like

find | grep | awk | xargs | ugh

...invoked via subprocess.run() then fed into more garbage.

I want "artificial intelligence" to learn this. It's going to be fantastic.

@gabrielesvelto I don't think that AI companies actually care about the quality of the code their systems spew. The whole point is that 'it works' (even when it doesn't), not that a human would be able to modify it later.
@mdione it's not just that. Most code you'll find around has notable bad patterns: a very common being mostly ignoring errors. Since LLM training gives disproportionate weight to common patterns, it means that the output will consistently reproduce bad ones. This output will be bound to be unstable and insecure by design, not just unmaintainable.