I’m excited to share some new research that was just published in Big Data & Society). We use factorial vignettes to understand the contextual factors that shape attitudes toward research uses of data on 3 platforms: Instagram (image-based), Reddit (pseudonymous communities), and dating apps (sensitive data). Quick thread (read the paper here: https://journals.sagepub.com/doi/10.1177/20539517231164108
This study builds on earlier work we did using a similar methodology and survey instrument with Facebook users, available here: https://journals.sagepub.com/doi/10.1177/20563051211033824. Both papers were led by the amazing Sarah Gilbert (@sarahgilbert), as well as Katie Shilton. These findings reiterate that the most important factor influencing attitudes toward data reuse is consent, something that is notoriously difficult to capture in large-scale data analyses. We also explore interesting differences across platforms.
We use these findings to make three key recommendations for researchers navigating online data use expectations. These focus on identifying platform norms and expectations related to data use, accounting for the affordances of a given platform, and increasing user awareness of research and data collection.

While there are norms consistent across platforms, this is not always the case. As we note, predicting sexuality/sexual preferences was highly concerning for Reddit & Instagram users, but not dating app users. This makes sense given the differing user goals on these platforms.

For researchers, our advice is to not generalize findings from one platform to others without considering how norms vary & adjusting data collection/analysis (or not doing the research at all!) based on those differences.

When considering affordances, visibility of the data played a big role. It's not just whether data is public or private, but the underlying assumptions and intentions with data sharing. People may make sensitive details (like HIV status) visible on a dating profile, but that doesn't mean they're okay with that data being used for additional purposes. Grindr caught a lot of flack for doing just this: https://t.co/ESbrWeLWGN
Grindr Sets Off Privacy Firestorm After Sharing Users’ H.I.V.-Status Data

Reports that the social network shared sensitive health and sexual data with outside companies set off a backlash.

The New York Times
We discuss how ephemerality and identifiability are potentially problematic, e.g., when collecting multiple posts from Reddit users. Patterns in where and what users post could easily identify individuals. A lack of a real name is insufficient protection from re-identification.
Finally, while informed consent procedures or opting in to data use remain the gold standard, these practices may be impossible to do at scale. We ask, what does it mean to use social media data without consent when the people described by that data expect us to ask for consent? Furthermore, when is operating without consent, despite user expectations, appropriate?
These questions require serious reflection and may require significant changes to a study design. Researchers should also document their decision process in their method section. And whenever possible, researchers should try to increase end-user awareness of the data collection and use, share results with end users, and provide straightforward mechanisms for users to have their data deleted. Not so hard, right?

These recommendations may feel hard to incorporate, but we hope researchers recognize that just because data is available to them doesn't mean users don't care about that who accesses that data. Put yourself in users' shoes & consider potential downstream harms.

Read the paper for more details and check out http://pervade.umd.edu for more info on PERVADE. We're currently building a decision-support tool to help researchers identify & work through many of these questions & will share soon. /end

PERVADE – Pervasive Data Ethics for Computational Research