Github has a setting "Allow GitHub to use my data for AI model training" which defaults to Enabled. You might want to turn it off, thought it's probably too late and likely won't stop other bots crawling your code.

https://github.com/settings/copilot/features

Build software better, together

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

GitHub
@dougbinks And if your library is popular enough, it's 100% copied under some other person's project repo, so that option is pretty useless.
@sol_hsa @dougbinks Signalling dissent is important for political change, even if it's practically useless right now..
I suppose from that POV, having a transitive licence that just procludes training as a suppliment to whatever code use licence you have might be a good thing.

@dougbinks example: https://github.com/search?q=%22ipc.h+-+v0.2%22&type=code

it has its own repo, but there's 30 copies of it under other people's projects

Build software better, together

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

GitHub

@sol_hsa @dougbinks I know you just meant this as an example, but public domain:

1. maybe has a higher distribution of people just copying it into their code bases
2. implicitly has permission for AI training

so I think you need an example that's not PD to make this argument convincing.

@nothings @dougbinks welll.. true. I've just seen it done to all sorts of libraries. People even copy the whole boost under their project, which is insane.

I don't think ipc.h is even that popular (especially compared to stb_image.h =), but I was still surprised how many times it's been copied..

I guess I could spend the evening looking up different repos that are copied under other repos but I think I'd rather watch the trees sway in the wind..

@dougbinks "code"

@dougbinks To make things complex, I don't have a fundamental problem with a Chinese AI lab that releases the resulting weights for me to use for free. It works, and it's one of the best models currently available.

But giving the data to Microslop, so they can keep them closed up tight, so they can sell them back to me? Yeah, no thanks.

@wolfpld Personally I 'd rather neither train on the code I write, but I am indeed more perturbed by those who want to sell access to it back.

@dougbinks I think this refers to your *Copilot* input/output/context, not your repos. I disabled that setting just to be sure, but I think as long as I'm not using Copilot it won't make a difference.

Pretty sure public repos have already been and will always be used for training AI models

@Doomed_Daniel It states "Allow GitHub to use my data for AI model training", which seems pretty clear.

There is another setting for "Repository access: Choose which repositories Copilot coding agent should be enabled in." which is here:
https://github.com/settings/copilot/coding_agent

Build software better, together

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

GitHub
@dougbinks as it's a Copilot setting and below that heading you cited refers to "Inputs, Outputs, and associated context" I'd assume this is about data in Copilot
@Doomed_Daniel I think you might be right, but the use of the term "data" makes me think it's potentially more than that. The linked document clarifies nothing sadlly.
@dougbinks true, probably makes sense to assume the worst