Myrthe Reuver

421 Followers
444 Following
95 Posts
Myrtle without the /l/. PhD candidate diversity in news rec for public debate @ CLTL at VU Amsterdam 📚 Research interests: Ethics in ML & #NLProc, implications, and cats.
websitehttps:/myrthereuver.github.io/

Our work on Fragmentation in News Recommendations is accepted at NORMalize 2023, at RecSys 2023! 🎉 Alessandra Polimeno's master thesis work, supervised by me, Sanne Vrijenhoek, and Antske Fokkens.

Fragmentation is people not seeing the same news stories.

💡 We find hierarchical clustering with SentenceBERT best at detecting Fragmentation, evaluating intrinsically and extrinsically.

I present (virtually) next week, Sept 19! 😃

https://arxiv.org/pdf/2309.06192.pdf

#NLProc #RecSys #RecSys2023

"Recently, AI developers have claimed their models perform well not only on a single task but in a variety of situations. “One of the things that’s going on with AI right now is that the companies producing it are claiming that these are basically everything machines,” Bender said. “You can’t test that claim.”

In the absence of any real-world validation, journalists should not believe the company’s claims."

I hope journalists heed this advice!

https://www.cjr.org/analysis/how-to-report-better-on-artificial-intelligence.php

How to report better on artificial intelligence

In the past few months we have been deluged with headlines about new AI tools and how much they are going to change society.  Some reporters have done amazing work holding the companies developing AI accountable, but many struggle to report on this new technology in a fair and accurate way.   We—an investigative reporter, a […]

Columbia Journalism Review

“Algoritmische fraudedetectie is in de kern anti-wetenschappelijk, het schedelmeten van de 21e eeuw.”

Lees mijn column van vandaag in
@Trouw, over de zogenaamde 'fraudebestrijding' bij DUO.

https://www.trouw.nl/cs-baa4e278

Duo deed aan schedelmeten, niet aan fraudebestrijding

Na de onthulling dat de organisatie voor studiefinanciering DUO met een fraudedetectie-algoritme jacht maakte op allochtonen, nam minister Dijkgraa...

Trouw
for the non dutchies: Im sure youve all seen the Discourse that with Twitter preventing public access to tweets is a big issue of stuff like emergency services and weather warnings.

This is now already playing out in Amsterdam. Theres a bit of a storm, so the fire department send out an emergency text message to everyone. The first version included a link to twitter to follow the latest updates. Then they realized, and a second version that went to a different region (as the storm travelled there) got a warning message without the twitter link.

I think the quote by the press person from the fire department really says it all: "We don't check every day if everyone can still read our tweets"


Considering that Amsterdam is the first municipality to run their own Mastodon server, it seems like the solution is right available

RE:
https://mastodon.social/users/BjornW/statuses/110660568459218026

I just published "The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con"

https://softwarecrisis.dev/letters/llmentalist/

I've been working on this essay on and off for a few months now. I kept convincing myself that I had to be wrong, this is too dumb to be true, but then the research I did changed my mind again.

The LLMentalist Effect: how chat-based Large Language Models rep…

The new era of tech seems to be built on superstitious behaviour

Out of the Software Crisis

The more representative the sample (larger), the more racist AI system outcomes. Shameful but not too surprising (https://arxiv.org/abs/2306.13141, @abebab ). This is probably an accurate reflection of stereotyped human behaviour. We are the problem!

In humans, stereotypes can be reduced with explicit training on positive associations.
Perhaps, this would be a more effective way to overcome some AI biases.
https://journals.sagepub.com/doi/10.1080/17470218.2010.493615

On Hate Scaling Laws For Data-Swamps

`Scale the model, scale the data, scale the GPU-farms' is the reigning sentiment in the world of generative AI today. While model scaling has been extensively studied, data scaling and its downstream impacts remain under explored. This is especially of critical importance in the context of visio-linguistic datasets whose main source is the World Wide Web, condensed and packaged as the CommonCrawl dump. This large scale data-dump, which is known to have numerous drawbacks, is repeatedly mined and serves as the data-motherlode for large generative models. In this paper, we: 1) investigate the effect of scaling datasets on hateful content through a comparative audit of the LAION-400M and LAION-2B-en, containing 400 million and 2 billion samples respectively, and 2) evaluate the downstream impact of scale on visio-linguistic models trained on these dataset variants by measuring racial bias of the models trained on them using the Chicago Face Dataset (CFD) as a probe. Our results show that 1) the presence of hateful content in datasets, when measured with a Hate Content Rate (HCR) metric on the inferences of the Pysentimiento hate-detection Natural Language Processing (NLP) model, increased by nearly $12\%$ and 2) societal biases and negative stereotypes were also exacerbated with scale on the models we evaluated. As scale increased, the tendency of the model to associate images of human faces with the `human being' class over 7 other offensive classes reduced by half. Furthermore, for the Black female category, the tendency of the model to associate their faces with the `criminal' class doubled, while quintupling for Black male faces. We present a qualitative and historical analysis of the model audit results, reflect on our findings and its implications for dataset curation practice, and close with a summary of our findings and potential future work to be done in this area.

arXiv.org
I would prefer it if Elon Musk was destroying his site during the work week. This isn't the first time.

I am super excited to announce that from today until the end of summer, I will be working as a Summer Intern "Computational Linguistics & AI" at Linkedin in Dublin, Ireland! 🤩 It feels a bit as #LinkedInception, posting about LinkedIn on LinkedIn.. 😅

I’m really eager to see how Computational Linguistics and #NLProc work in such a large-scale and applied scenario! 😁

#linkedin #intern #ai #NLProc #summer #machinelearning #PhD

The wonderful @fe_loe had her PhD defense, and is now dr. Loecherbach, cum laude!! 😱

Her dissertation “Diversity of News Consumption in a Digital Information Environment” is a joy to read for all interested in news & diversity, and her careful answers let me (re)think a lot! 🤔

Besides being an excellent researcher, she was and is also a role model for me. 🤗 Felicia, I wish you all the future success and joy, in your research and beyond it! ✨ You can do everything!

New postdoc job in AI / CogSci available at the Santa Fe Institute.

Are you a grad student or postdoc interested in working with me on AI systems for abstraction and analogy?

See https://santafe.edu/about/jobs/postdoc-ai for more info. Apply by June 9.

Please share!

(View from our campus)

Jobs: Postdoctoral Fellow (Artificial Intelligence / Cognitive Science) | Santa Fe Institute

SFI seeks afull-time postdoc to collaborate in developing AI models of conceptual abstraction and analogy-making.