Can researchers detect #AI bots taking paid surveys?
#Prolific tested humans and #LLM agents with various #dataQuality checks.
- The company says they caught 100% of the non-humans.
- My take-away: #reCAPTCHA and #mouseTracking caught 95%
Can researchers detect #AI bots taking paid surveys?
#Prolific tested humans and #LLM agents with various #dataQuality checks.
- The company says they caught 100% of the non-humans.
- My take-away: #reCAPTCHA and #mouseTracking caught 95%
Hey #SurveyMethods and #MedEd folks:
In a workshop for #MedSchool faculty about questionnaire design and survey research methods, what
- objectives should be prioritized?
- materials are out there?
- activities are worth it?
- assessments work?
We've found recruiting people for online #research via #onlineAdvertising yielded good results on overt and covert #dataQuality measures (perhaps because participation incentives aren't financial):
Attention checks passed ≅ 2.6 out of 3
ReCAPTCHA (v3) ≅ 0.94 out of 1.0
Sample size > 5000 (from six continents)
https://doi.org/10.1017/S0034412525000198
#surveyMethods #cogSci #psychology #xPhi #QualityControl #econ #marketing
I forgot to share the #mTurk data quality result that got scooped:
“In late 2020…. Participants from the United States were recruited from Amazon Mechanical Turk, -#CloudResearch, #Prolific, and a #university. One participant source yielded up to 18 times as many low-quality respondents as the other three.”
https://doi.org/10.1093/analys/anaf015
#psychology #philosophy #surveyMethods #quantMethods #dataScience #qualityControl
RE: https://mastodon.acm.org/@neilernst/115607469843033537
😳 The #AI survey taker "rendered [attention quality] checks [ACQs] effectively obsolete. Across 6,000 total trials..., [it] committed only 10 errors, achieving an overall pass rate of 99.8% and scoring perfectly on 18 of the 20 ACQ types."
#surveyMethods #psychometrics #psychology #tech #psychology #philSci #SciComm #dataQuality
Can thinking aloud accurately capture how we decide?
Nisbet & Wilson's 1977 paper famously suggested it can't.
But #NLP and #AI methods may indicate that it can:
- https://escholarship.org/uc/item/4sb936m3
- https://openreview.net/forum?id=1Tny4KgGO2
#cogSci #psychology #philosophy #surveyMethods #thinkAloud #LLM
#ComparativeResearch #EducationMeasurement #SurveyMethods
Measuring education across countries is complex—but crucial for valid, comparable survey data. This study tests 16 coding strategies using ESS data and finds a strong contender for a new international standard.
Read the article:
Schneider S.L. & Urban J. (2025). A myriad of options: Validity and comparability of alternative international education variables. Survey Methods: Insights from the Field: 10.13094/SMIF-2025-00008
@GESIS
Thanks for sharing the talk!
Here is the Question-Link R-Package and an in depth tutorial on using it:
https://matroth.github.io/questionlink/index.html
And please note that we also offer consultations regarding harmonization techniques (and other survey method topics).
https://www.gesis.org/en/consulting/survey-methods-consulting
Given that surveys tend to overestimate belief in #conspiracyTheories (https://osf.io/preprints/psyarxiv/zsncr_v1) and support for #politicalViolence (https://doi.org/10.1073/pnas.2116870119), I wonder how much of the correlation between such variables remains after accounting for such measurement error.
New #surveyMethods paper replicates and extends differences in #dataQuality, attention, naivety, decision style, etc. by
- online #research recruitment platform (#mTurk, #Prolific, #Qualtrics, #Pollfish)
- device (#mobile v. #desktop)
- person's incentive
Online crowdsourcing platforms such as MTurk and Prolific have revolutionized how researchers recruit human participants. However, since these platforms primarily recruit computer-based respondents, they risk not reaching respondents who may have exclusive access or spend more time on mobile devices that are more widely available. Additionally, there have been concerns that respondents who heavily utilize such platforms with the incentive to earn an income provide lower-quality responses. Therefore, we conducted two studies by collecting data from the popular MTurk and Prolific platforms, Pollfish, a self-proclaimed mobile-first crowdsourcing platform, and the Qualtrics audience panel. By distributing the same study across these platforms, we examine data quality and factors that may affect it. In contrast to MTurk and Prolific, most Pollfish and Qualtrics respondents were mobile-based. Using an attentiveness composite score we constructed, we find mobile-based responses comparable with computer-based responses, demonstrating that mobile devices are suitable for crowdsourcing behavioral research. However, platforms differ significantly in attentiveness, which is also affected by factors such as the respondents’ incentive for completing the survey, their activity before engaging, environmental distractions, and having recently completed a similar study. Further, we find that a stronger system 1 thinking is associated with lower levels of attentiveness and acts as a mediator between some of the factors explored, including the device used and attentiveness. In addition, we raise a concern that most MTurk users can pass frequently used attention checks but fail less utilized measures, such as the infrequency scale.