Olivier Grisel

@ogrisel@sigmoid.social
2.2K Followers
1.1K Following
439 Posts
Machine Learning Engineer at :probabl., scikit-learn core contributor. #Python, #Pydata, #MachineLearning & #DeepLearning.
githubhttps://github.com/ogrisel
twitterhttps://twitter.com/ogrisel
gravatarhttps://en.gravatar.com/oliviergrisel

Call for sponsors - PyData Paris 2025 !

Join us and be part of one of the best open-source events of the year.

Sponsoring will showcase your support to the community and allow you to reach a wide audience of tech and data enthusiasts from academia and the industry.

Contact us:
Email: pydata@quantstack.net
website: https://pydata.org/paris2025/sponsorship-opportunites#tiers

Let's grow together at PyData Paris 2025!

#PyData #NUMFOCUS #PyDataParis

Sponsor — PyData Paris 2025

PyData Paris 2025

Working on core array computing libraries that power #scientificPython? #EuroSciPy2025 wants your proposals on optimized array operations, vectorization techniques, and numerical foundations. Submit your groundbreaking work!

https://pretalx.com/euroscipy-2025/cfp

#ScientificComputing #Python #EuroSciPy #numpy #scipy #pandas #polars

EuroSciPy 2025

Schedule, talks and talk submissions for EuroSciPy 2025

The call for proposals for PyData Paris 2025 is now open!
Don't delay, get your submissions early.
CfP Deadline: Sunday 13th April 2025

📍 Cité des sciences et de l'industrie
📆 September 30th and October 1st.

https://pydata.org/paris2025/cfp

Call for Proposals — PyData Paris 2025

PyData Paris 2025

📢We are excited to announce the keynote speakers for our 2025 conference:
- Alenka Frim from United.Cloud
- @ralfgommers from Quansight
- @underdarkGIS from the Austrian Institute of Technology

Join us at PyData Paris at Cité des Sciences from Sep 30 to Oct 1, 2025.

Check out our blog post for more details!
https://medium.com/@PyDataParis/pydata-paris-2025-50ff2bf2dc39

PyData Paris 2025 Keynotes - PyData Paris - Medium

We are thrilled to announce the keynote speakers for the upcoming PyData Paris 2025, the leading gathering of the open-source data science and AI/ML community in France. PyData Paris will take place…

Medium

Geocomputation with Python: Now in Print!

Today, I'm super excited to share with you the announcement that our open source textbook "Geocomputation with Python" has finally arrived in print and is now available for purchase from Routledge.com, Amazon.com, Amazon.co.uk, and other booksellers. "Geocomputation with Python" (or geocompy for short) covers the entire range of standard GIS operations for both vector and raster data…

http://anitagraser.com/2025/01/31/geocomputation-with-python-now-in-print/

Geocomputation with Python: Now in Print!

Today, I’m super excited to share with you the announcement that our open source textbook “Geocomputation with Python” has finally arrived in print and is now available for purcha…

Free and Open Source GIS Ramblings

Stratified cross-validation considered harmful?

Since the early days of @sklearn
we have nudged users into using stratified cross-validation in the presence of class imbalance.

I think this was a design mistake. Here is a notebook to back my claim:

https://gist.github.com/ogrisel/af21bdc55a2c02671a48c68631ee7294

And here is a bluesky summary thread:

https://bsky.app/profile/ogrisel.bsky.social/post/3lcnjg2iv4s2w

Impact of the use of stratified cross-validation on the assesment of epistemic uncertainty in ML performance metrics

Impact of the use of stratified cross-validation on the assesment of epistemic uncertainty in ML performance metrics - stratified_cv.ipynb

Gist

The video recording of my #PyData Paris 2024 presentation on probabilistic predictions, classifier calibration and optimal decision-making under uncertainty is now online:

https://www.youtube.com/watch?v=-gYnfA0e5ic

KEYNOTE: Olivier Grisel - Handling predictive uncertainty in Machine Learning | PyData Paris 2024

YouTube

Human fact-checkers spend a lot of time retrieving the true context of fake news images 🕵️

Can AI help? YES! But many open challenges remain!

Meet the 5Pils dataset for image contextualization at #EMNLP2024 ! - learn more in this 🧵 (1/8) #NLProc
📰 arxiv.org/abs/2408.09939

Were RNNs All We Needed?

https://arxiv.org/abs/2410.01201

A #MachineLearning #Paper by Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh

Apparently, stripped down variants of the old LSTM and GRU architectures can rival more recently introduced architectures for long sequence modelling like Mamba while enjoying a high parallelization efficiency.

Were RNNs All We Needed?

The introduction of Transformers in 2017 reshaped the landscape of deep learning. Originally proposed for sequence modelling, Transformers have since achieved widespread success across various domains. However, the scalability limitations of Transformers - particularly with respect to sequence length - have sparked renewed interest in novel recurrent models that are parallelizable during training, offer comparable performance, and scale more effectively. In this work, we revisit sequence modelling from a historical perspective, focusing on Recurrent Neural Networks (RNNs), which dominated the field for two decades before the rise of Transformers. Specifically, we examine LSTMs (1997) and GRUs (2014). We demonstrate that by simplifying these models, we can derive minimal versions (minLSTMs and minGRUs) that (1) use fewer parameters than their traditional counterparts, (2) are fully parallelizable during training, and (3) achieve surprisingly competitive performance on a range of tasks, rivalling recent models including Transformers.

arXiv.org

We often saw statistic that there are few software companies in the EU as an example of lack of innovation.

But 48% of maintainers of open source projects live in Europe, compared to 38% in North America and 8% in Asia. And the number is growing.

https://explore.tidelift.com/2024-survey

The 2024 Tidelift state of the open source maintainer report

Tidelift