Researchers Scrape 2 Billion #Discord Messages & Publish Them Online

The data was pulled from 3,167 servers & covers posts made between 2015 & 2024, the entire time Discord has been active.

… claim they’ve #anonymized the data, it’s hard to imagine anyone is comfortable with almost a decade of their Discord messages sitting in a public JSON file. Separately, … a Discord tool called "Searchcord" based on a diff data set that shows non-anonymized chat histories
#privacy

https://www.404media.co/researchers-scrape-2-billion-discord-messages-and-publish-them-online/

Researchers Scrape 2 Billion Discord Messages and Publish Them Online

A Brazilian team used Discord’s API to scrape 10% of its open servers.

404 Media

#DuckDuckGo is now offering free, #anonymized access to a number of fast #AI #chatbots that won't train in your data. You currently don't get all the premium models and features of paid services, but you do get access to privacy-promoting, anonymized versions of smaller models like GPT-4o mini from #OpenAI and open-source #MoE (mixture of experts) models like Mixstral 8x7B.

Of course, for truly sensitive or classified data you should never use online services at all. Anything online carries heightened risks of human error; deliberate malfeasance; corporate espionage; legal, illegal, or extra-legal warrants; and network wiretapping. I personally trust DuckDuckGo's no-logging policies and presume their anonymization techniques are sound, but those of us in #cybersecurity know the practical limitations of such measures.

For any situation where those measures are insufficient, you'll need to run your own instance of a suitable model on a local AI engine. However, that's not really the #threatmodel for the average user looking to get basic things done. Great use cases include finding quick answers that traditional search engines aren't good at, or performing common AI tasks like summarizing or improving textual information.

The AI service provides the typical user with essential AI capabilities for free. It also takes steps to prevent for-profit entities with privacy-damaging #TOS from training on your data at whim. DuckDuckGo's approach seems perfectly suited to these basic use cases.

I laud DuckDuckGo for their ongoing commitment to privacy, and for offering this valuable additional to the AI ecosystem.

https://duckduckgo.com/chat

DuckDuckGo AI Chat at DuckDuckGo

DuckDuckGo. Privacy, Simplified.

Can Virtual Reality Make Hiring Fairer?

Anonymizing resumés in a recruitment process does not ensure fairness when hiring decisions hinge on in-person interviews. The problem may now be solvable with virtual reality.

Psychology Today

What are people's thoughts on "#ethicaltelemetry"?

I'm making a simple app with the purpose of releasing it on the Play store, FDroid, etc.

And I want to track:
- #Anonymized install count
- source of install(#Source built, Bleeding Edge, #fdroid and #playstore)

(I have a way to track source built/play store/fdroid already but no way to get the #data off-device)

I'm quite new to the whole scene so I'm asking here:
- How do I do it in a way that doesn't piss people off? (ofc target market doesn't care but still)
- I also want to ensure the telemetry method works on phones as #old as possible, like #android 4.0 api 16, if possible
- How do I #protect myself from some kid figuring out my telemetry method and spamming it with bots?

Intro screen -> Opt in/opt out(with red warning explaining how much it'll help us to have those stats) -> button at the bottom after all the checks called "Login" leading to Login.kt fragment.

I'm thinking of something like this but I still have no idea how to #collect this info

#Meta will start collecting “anonymized” data about #Quest headset usage
#facebook #privacy #surveillance #anonymized

https://arstechnica.com/?p=2006128

Meta will start collecting “anonymized” data about Quest headset usage

Info on hand/eye tracking, "physical environment," and more could be included.

Ars Technica
Anonymized & Aggregated Transaction Data Powers New AI Models | Brighterion AI - The Triangle Agency

Brighterion's market-ready AI solutions for acquirers are trained on global transaction data for superior fraud detection.

The Triangle Agency

87% of the population in the United States has characteristics that likely made them unique based only on 5-digit ZIP, gender, date of birth!

#leaked or #anonymized datasets can hold a #privacy risk with very little information.

These stats come from a study done by Latanya Sweeney in 2000 viewable here (PDF) https://dataprivacylab.org/projects/identifiability/paper1.pdf

#Anonymized data is rarely #anonymous

Mozilla and Microsoft using this argument since years but if data are truly anonymous then there is no point in collecting it because you could not use them at all.

https://flowingdata.com/2021/12/29/anonymized-data-is-rarely-anonymous/

Anonymized data is rarely anonymous

Justin Sherman for Wired points out the farce that is anonymized data: Data on hundreds of millions of Americans’ races, genders, ethnicities, religions, sexual orientations, political beliefs, int…

FlowingData