As more academics reach Fedi, please PLEASE consider not doing research on users here without explicit opt-in consent

This isn't a zoo

It's not just condescending for you to treat us that way, it's also against a lot of instances' terms of use

See "Use of Scholar Social for research" at the following link for an example:

https://scholar.social/privacy-policy

Scholar Social

Microblogging for researchers, grad students, librarians, archivists, undergrads, high schoolers, educators, research assistants, profs—anyone involved in learning who engages with others respectfully

Mastodon hosted on scholar.social

@socrates Thank you for sharing! I’m not quite sure if I agree with that policy. I feel like a post that is set to be publicly accessible constitutes public communication.

I would agree that publishing more than just anonymized aggregate data or IDs (like Twitters TOS suggest) is out of the question but analyzing public communication can’t require a specific opt-in from every user. That does not seem feasible to me.

@Kudusch @socrates I feel like Kususch's stance on this is probably more nuanced than is coming across in this thread, but FWIW the Association of Internet Researchers argues, as a scholarly community, that context matters when deciding what should count as "public" for the purpose of research. I recommend reading this post by Hugh Rundle for an idea of how long-time Mastodon users have viewed their contributions and privacy in context. https://www.hughrundle.net/home-invasion/

[Edited for clarity]

Home invasion - Mastodon's Eternal September begins

The fediverse is dealing with a huge wave of Twitter people bringing toxic ideas with them.

@josh @Kudusch Okay the context here is that we told you explicitly not to do it and now you're aware of that expectation

Many of our users came here to *avoid* being included in Cambridge Analytica type situations and we're telling you: No

That's the context

If a person talking in a public place like a coffee shop told you "stop writing down everything I'm saying," what would the right thing to do be?

@socrates @josh I definitely get what you are saying. An explicit opt-out like for the users of scholar.social must be considered. That's basic research ethics.

My background is in disinformation studies, where there are usually very little good-faith actors. Users actively using the fediverse to spread (potentially harmful) misinformation are tough to identify before somehow accessing public communication at large.

@socrates @josh

Maybe implementing something like an opt-in for (non-commercial) research on an account-basis that can be access through the API could be a solution on the long-run.

@Kudusch @josh Now that would be very interesting

@socrates @josh That would still leave the issue of bad-faith actors unsolved, but actually building research consent into the protocol itself might be a good start!

Because, if I'd access the API from an instance where research is even explicit allowed, I still might get content from users of instances, where it is not. Being able to just filter out user's content that opted-out would be very helpful.

@Kudusch @socrates @josh we had a similar discussion https://scholar.social/@mdanganh/109304920124242505 i.a. @javier suggested: "do not scrap" meta tag as default, plus an opt-in option for active consent in the preferences for sharing account contents for research purposes (open science, non-commercial). That info would have to be available via API.
Tools like {rtoot} should be designed to only scrape accounts that have actively consented (still, *informed* consent is an issue)
Mark Dang-Anh (@[email protected])

I guess that many fellow linguists and social media researchers who, like me, are new here, are already itching to compile corpora from Mastodon data. I have to admit, so do I. But all I can find on #scraping in toots, server rules and blogs is: "Don't! Scrape! Mastodon!" May this be the right time to rethink ingrained research habits, reconsider ethical standards and adapt to a new community and a different social media culture? Any thoughts? #researchethics @linguistics #corpuslinguistics

Scholar Social
@mdanganh @Kudusch @socrates @javier Could you set up a bot/relay tied to the crawler that would notify a user via a DM that their instance or account had been scraped, give a link to a study description, and provide info for opting out post-hoc, even if their initial profile metadata had indicated opt-in? And provide the researchers' contact info, obviously.
@socrates @mdanganh @josh @javier That’s a fantastic idea!
@Kudusch @socrates @mdanganh @javier Unless it becomes spam. No one would want to get 40 of these a month.