If you had code on GitHub at any point it looks like it might be included in a large dataset called “The Stack” — If you want your code removed from this massive “ai” training data go here:

https://huggingface.co/spaces/bigcode/in-the-stack

I found two of my old Github repos in there. Both were deleted last year and both were private. This is a serious breach of trust by Github and @huggingface.

Remove all your code from Github.

CONSENT IS NOT OPT-OUT.

Edit — thanks for all the replies. More context here: https://hachyderm.io/@joeyh/112105744123363587

Also the repos i found of mine i’m sure were private, but even if they were public at some point, for a brief time, in the past that isn’t my consent to use them for purposes beyond their intent.

---
Edit 2 -- I see this made it to HN, which is a level of attention I do not want nor appreciate....

For all those wondering about the private repo issue -- No, I am not 100% sure that these ancient repos weren't at some point public for a split second before I changed it. I do know that they were never meant for this and that one of them didn't even contain any code.

If my accidentally making a repo public for a moment just so happened to overlap with this scraping, then I guess that's possible. But it in no way invalidates the issues, and the anger that i feel about it.

Am I in The Stack? - a Hugging Face Space by bigcode

This app lets you check if your GitHub repositories are part of the The Stack dataset. Enter your GitHub username and select the dataset version to see if your code is included. If you want your da...

@emenel @[email protected] When I put code on GitHub, it's quite clear that I'm consenting for it to be cloned/scraped by others - indeed that's the whole point of the platform. So long as 'The Stack' aren't violating the licensing terms and they make a point of telling users of the dataset not to they aren't actually doing anything different from me using your code. Open source disclosure of IP runs with it the risk of unanticipated use cases you don't like.

The concerning part here is the use of private repos, though what I can't find is any evidence of GitHub themselves being involved in the project which would be necessary for that to happen without illegal hacking of their platform. So this needs to be clarified and is potentially very serious.
@alastair @emenel @huggingface if your repo was public cloned at some point, and then switched to private… you are out of luck. But that’s on you. It could be in the Internet archive or any other archive locations.