Saige Rutherford

@being_saige
344 Followers
371 Following
64 Posts
Ph.D. candidate thinking about machine learning & mental health from a human rights perspective at the Donders Institute & University of Michigan. American living in the Netherlands with my dog, Charlie-Mop. Kindness & Curiosity. Utility >> Intelligence.
GitHubhttps://github.com/saigerutherford
LinkedInhttps://www.linkedin.com/in/saigerutherford/
Websitehttps://www.beingsaige.com

🎞️ Representation learning on relational data to automate data preparation

My presentation at @ida : Data preparation
is crucial to analysis, but better pipelines can reduce
this need 🌟
https://speakerdeck.com/gaelvaroquaux/representation-learning-on-relational-data-to-automate-data-preparation

Many of these data patterns are in the examples of https://dirty-cat.github.io/

Representation learning on relational data to automate data preparation

In standard data-science practice, a significant effort is spent on preparing the data before statistical learning. One reason is that the data come from various tables, each with its own subject matter, its specificities. This is unlike natural images, or even natural text, where universal regularities have enabled representation learning, fueling the deep learning revolution. I will present progress on learning representations with data tables, overcoming the lack of simple regularities. I will show how these representations decrease the need for data preparation: matching entities, aggregating the data across tables. Character-level modeling enable statistical learning without normalized entities, as in the <a href="https://dirty-cat.github.io">dirty-cat library</a>. Representation learning across many tables, describing objects of different nature and varying attributes, can aggregate the distributed information, forming vector representation of entities. As a result, we created general purpose embeddings that enrich many data analyses by <a href="https://soda-inria.github.io/ken_embeddings/">summarizing all the numerical and relational information in wikipedia for millions of entities: cities, people, companies, books</a> [1] Marine Le Morvan, Julie Josse, Erwan Scornet, & Gaël Varoquaux, (2021). <a href="https://proceedings.neurips.cc/paper/2021/hash/5fe8fdc79ce292c39c5f209d734b7206-Abstract.html">What’s a good imputation to predict with missing values?. Advances in Neural Information Processing Systems, 34, 11530-11540.</a> [2] Patricio Cerda, and Gaël Varoquaux. <a href="https://ieeexplore.ieee.org/abstract/document/9086128">"Encoding high-cardinality string categorical variables." IEEE Transactions on Knowledge and Data Engineering (2020).</a> [3] Alexis Cvetkov-Iliev, Alexandre Allauzen, and Gaël Varoquaux. <a href="https://ieeexplore.ieee.org/abstract/document/9758752">"Analytics on Non-Normalized Data Sources: more Learning, rather than more Cleaning." IEEE Access 10 (2022): 42420-42431.</a> [4] Alexis Cvetkov-Iliev, Alexandre Allauzen, and Gaël Varoquaux. <a href="https://hal.science/hal-03848124">"Relational data embeddings for feature enrichment with background information." Machine Learning (2023): 1-34.</a>

Speaker Deck
🚨 NEW from our Duke data brokerage research team: former student Joanne Kim asked 37 data brokers about buying mental health data, 11 of which were willing to sell it. Advertised data included depression, insomnia, ADHD, anxiety, ... — some for just hundreds of dollars. https://techpolicy.sanford.duke.edu/data-brokers-and-the-sale-of-americans-mental-health-data/
Data Brokers and the Sale of Americans’ Mental Health Data - Tech Policy @ Sanford

Data Brokers and the Sale of Americans’ Mental Health Data The Exchange of Our Most Sensitive Data and What It Means for Personal Privacy  By: Joanne Kim   Overview: This...

Tech Policy @ Sanford

New in TiNS: The tricky business of defining brain functions

”Observations lead to interpretations. Interpretations become concepts. And concepts may become dogmas that feel so intuitive, so natural, that they are accepted without question. We should, from time to time, re-evaluate the core beliefs of our fields of study.”

https://www.cell.com/trends/neurosciences/fulltext/S0166-2236(22)00213-2

I really wish there was wider awareness of this issue, especially outside of academia

"Dear all,

We are thrilled to announce the inaugural #ComputationalPsychiatry Conference to take place at Trinity College Dublin on July 6-8th, 2023 (#cpconf2023)

https://www.cpconf.org/

One of the key aims of #ComputationalNeuroscience is to construct theoretical accounts of normal mental function that link characterizations of #neurobiology, #psychology and aspects of the environment. In Computational Psychiatry (CP), these theories, realized in models at various scales, are used to elucidate dysfunction.

The 2023 Computational Psychiatry Conference (7th and 8th July) will contain six sessions, each with a keynote talk from senior faculty and also contributed talks and panel discussions.

The session themes will include Diagnostics, Reinforcement Learning models, Individual-level prediction, Development, Animal models and Treatments. There will also be poster sessions on both days.

The tutorial session (afternoon of 6th July) will contain three introductory talks on #psychiatry for non-clinicians, #BehaviouralModelling using #BayesianInference and #ReinforcementLearning, and #MachineLearning.

Abstract submissions will be closed on March 15th, 2023. We will be able to support 10 participants with a travel award based on a competitive review of their abstract submissions. Top submissions will also be invited as talks.

We look forward to seeing everyone in Dublin this summer!"

Computational Psychiatry Conference

University of Minnesota (July 16-18, 2024)

#cpconf2024

🚨 Excited to share that Veronika (@DrVeronikaCH) and I are organizing the webinar series: Datasets through the L👀king-Glass

🤝 We aim to bring together scientists interested in understanding how the data affects the algorithms and society as a whole.

http://purrlab.github.io/webinar/

Webinar | PURRlab @ IT University of Copenhagen

Lab website for PURRlab (Pattern Recognition Revisited lab), IT University of Copenhagen.

We all admire Amsterdam for having the vision to replace car infrastructure with bike infrastructure. We see the positive uplift in small business activity, and livability. 🙂👍🏿

Now imagine doing that in reverse. Replace relatively safely walkable and bikeable infrastructure with car infrastructure. In fact, put in freeways. Demolish entire thriving wealthy neighborhoods with freeways that don't serve the neighborhood.🙃

That's what we did to Black folk. That's how we destroyed Black wealth.

2022, a new scientific adventure: machine learning for health and social sciences

A small retrospective on last year: I embarked on a new scientific adventure, @Soda_Inria, a team focused on machine learning for health and social science.

The team has existed for almost a year, and the vision is shaping up. I wrote a short text to share illustrations of where we are at.

https://gael-varoquaux.info/science/2022-a-new-scientific-adventure-machine-learning-for-health-and-social-sciences.html

2022, a new scientific adventure: machine learning for health and social sciences -- Gaël Varoquaux: computer / data / health science

Gaël Varoquaux, computer / data / health science

Just had a first meeting with researchers at a hospital who wanted to try out a methodological review board https://www.nature.com/articles/d41586-022-04504-8 at their department, because they felt it had great potential to improve the quality of their research. I'm looking forward to help them.
Is my study useless? Why researchers need methodological review boards

Making researchers account for their methods before data collection is a long-overdue step.

Hi everyone working in AI and/or wondering about how AI impacts our lives: you really want to listen to this excellent, clear, accessible and to-the-point interview with @timnitGebru “Is ethical AI possible?” #AIEthics https://open.spotify.com/episode/0Zyexhty6AEbINudjfnuRB?si=zZ3amJ6gQxK_6FCTq7nL9g&context=spotify%3Ashow%3A6NOJ6IkTb2GWMj1RpmtnxP
Is ethical AI possible?

Listen to this episode from The Gray Area with Sean Illing on Spotify. Sean Illing talks with Timnit Gebru, the founder of the Distributed AI Research Institute. She studies the ethics of artificial intelligence and is an outspoken critic of companies developing new AI systems. Sean and Timnit discuss the power dynamics in the world of AI, the discriminatory outcomes that these technologies can cause, and the need for accountability and transparency in the field. Host: Sean Illing (@seanilling), host, The Gray Area Guest: Timnit Gebru (@timnitGebru), founder, Distributed AI Research Institute References:  “The Exploited Labor Behind Artificial Intelligence" by Adrienne Williams, Milagros Miceli, and Timnit Gebru (Noema; Oct. 13, 2022) “Effective Altruism is Push a Dangerous Brand of ‘AI Safety’” by Timnit Gebru (Wired; Nov. 30, 2022) Datasheets for Datasets by Timnit Gebru, et al. (CACM; Dec. 2021) “In Emergencies, Should You Trust a Robot?” by John Toon (Georgia Tech; Feb. 29, 2016) “We read the paper that forced Timnit Gebru out of Google. Here’s what it says” by Karen Hao (MIT Technology Review; Dec. 4, 2020) “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” by Timnit Gebru, et al. (Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency; March 2021)   Enjoyed this episode? Rate The Gray Area ⭐⭐⭐⭐⭐ and leave a review on Apple Podcasts. Subscribe for free. Be the first to hear the next episode of The Gray Area. Subscribe in your favorite podcast app. Support The Gray Area by making a financial contribution to Vox! bit.ly/givepodcasts This episode was made by:  Producer: Erikk Geannikis Editor: Amy Drozdowska Engineer: Patrick Boyd Editorial Director, Vox Talk: A.M. Hall Learn more about your ad choices. Visit podcastchoices.com/adchoices

Spotify
It's such painful news (not to mention bad optics) to see MS laying off 10K to invest more in AI while OpenAI outsources traumatic labeling for $2/hour. Tech is broken.