Github has a setting "Allow GitHub to use my data for AI model training" which defaults to Enabled. You might want to turn it off, thought it's probably too late and likely won't stop other bots crawling your code.

https://github.com/settings/copilot/features

Build software better, together

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

GitHub
@dougbinks And if your library is popular enough, it's 100% copied under some other person's project repo, so that option is pretty useless.
@sol_hsa @dougbinks Signalling dissent is important for political change, even if it's practically useless right now..
I suppose from that POV, having a transitive licence that just procludes training as a suppliment to whatever code use licence you have might be a good thing.

@dougbinks example: https://github.com/search?q=%22ipc.h+-+v0.2%22&type=code

it has its own repo, but there's 30 copies of it under other people's projects

Build software better, together

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

GitHub

@sol_hsa @dougbinks I know you just meant this as an example, but public domain:

1. maybe has a higher distribution of people just copying it into their code bases
2. implicitly has permission for AI training

so I think you need an example that's not PD to make this argument convincing.

@nothings @dougbinks welll.. true. I've just seen it done to all sorts of libraries. People even copy the whole boost under their project, which is insane.

I don't think ipc.h is even that popular (especially compared to stb_image.h =), but I was still surprised how many times it's been copied..

I guess I could spend the evening looking up different repos that are copied under other repos but I think I'd rather watch the trees sway in the wind..