"GitHub’s Copilot will use you as AI training data, but you can opt out"

"...if you’ve used the code completion in Visual Studio Code, asked Copilot a question on the GitHub website, or used another related AI feature, your interactions and code snippets could be harvested...."

https://www.howtogeek.com/githubs-copilot-will-use-you-as-ai-training-data-but-you-can-opt-out/

#ai #microsoft #copilot

GitHub’s Copilot will use you as AI training data, but you can opt out

That includes the Copilot features in Visual Studio Code.

How-To Geek
@ai6yr I’ve assumed all along that ALL the code stored in GitHub has been used to train their LLM. Does anyone believe that is not the case?
@patmikemid @ai6yr it was always the case because early versions of copilot would happily suggest other peoples api keys etc if they had been accidentally committed
@SecureOwl @patmikemid LOL there are so many keys in github. I imagine people are already automatically scraping them for nefarious purposes.

@ai6yr @patmikemid oh yeah 100% - when i was running security for an IoT platform (yes, we had security), i used to scrape defensively as well and reach out to people who committed api keys to our platform by accident before they could be used by bad actors

github has a program that will autodetect them too but you have to commit to using a unique key format so they can have more reliable regex

@SecureOwl @ai6yr @patmikemid

It is called "secret scanning" on Github (and Azure DevOps)

You can replicate a similar and perhaps better result with gitleaks and a git precommit setup.

If your environment has pipelines/runners, ALSO add a job (or w/e your variant calls it) that triggers on commits to run gitleaks.

That won't stop them from being being committed but you'll get a warning that there are secrets being stored.