Jesper Dr.amsch – Realworld ML

@jesperdramsch
1 Followers
67 Following
59 Posts

I apply #machinelearning in the real world and share what I know.

🌦 Scientist for ML at ECMWF
πŸ’Ύ Fellow Software Sustainability Institute
πŸ“š Teacher #Skillshare 4700+
πŸŽ₯ #Youtube Partner
πŸ₯‡ #Kaggle code #81
πŸŽ“ Phd from DTU

✨ websiteshttps://dramsch.net https://pythondeadlin.es
πŸ‘” 15k on LIhttps://dramsch.net/links
πŸ’Œ ML Newsletterhttps://dramsch.net/newsletter
πŸ³β€πŸŒˆthey/them

Hello #TwitterMigration friends, ideas for curating your timeline, which is quite like #gardening: what you plant will be what grows here for you.

1. If you mainly follow others who arrived with you, you depend on them staying. Balance with people who post regularly and there will be new growth here when you check.

2. Follow people who boost others. The serendipity of their associations will seed new ideas for you.

3. It’s a new home: you don’t need the same garden you had in your old home.

Learn Machine Learning with Python for free. Session 3 of the scikit-learn MOOC started recently, you still have time to join!

https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/

By following the MOOC through the FUN platform you can access the forum, do the quizzes for auto-evaluation and get an https://openbadges.org upon completion.

The FUN platform respects the users privacy and does not share/sell the progress data with anybody.

Machine learning in Python with scikit-learn

Build predictive models with scikit-learn and gain a practical understanding of the strengths and limitations of machine learning!

FUN MOOC
Anyone else feel that relief of just being able to like posts, without having to think about them appearing in folks timelines?
I just painstakingly learned that you can't change the visibility of posts after the fact.… πŸ˜… 😭

TL;DR

Overall, this tutorial is aimed at applied scientists, exploring ML for their problems.

We looked at 6 ready-to-use notebooks to make your life easier.

This resource is for you to steal and make better science.

Each tool makes it more likely for
β€’ Your results to go through review
β€’ Others to use and cite your stuff
β€’ The code fairy to smile upon you

We focused on β€œeasy wins” scientists can implement in research to avoid catastrophic failures and increase reproducibility.

βœ‚οΈ Ablation Studies

You know it. I know it.

Data science is trying a lot and finding what works.
It's iterative!

Use ablation studies to switch off components in your solution to evaluate the effect on the final score!

This care is great in a paper!

https://dramsch.net/articles/euroscipy-2022/euroscipy-tutorial-6-ablation-study/

🧠 Interpretability

This is a great communication tool for papers and meetings with domain scientists!

No one cares about your mean squared error!

How does the prediction depend on changing your input values?!

What features are important?!

https://dramsch.net/articles/euroscipy-2022/euroscipy-tutorial-5-interpretability/

βš—οΈ Testing

I know code testing in science is hard.

Here are ways that make it incredibly easy:
β€’ Doctests for small examples
β€’ Data Tests for important samples
β€’ Deterministic tests for methods

https://dramsch.net/articles/euroscipy-2022/euroscipy-tutorial-4-testing/

You can make your own life and that of collaborators 1000 times easier!

Use Input Validation.

Pandera is a nice little tool that lets you define how your input data should look like. Think:
β€’ Data Ranges
β€’ Data Types
β€’ Category Names

It's honestly a game changer and easy!

🀝 Model Sharing

Sharing models is great for reproducibility and collaboration.

Export your models and fix the random seed for paper submissions.

Share your dependencies in a requirements.txt or env.yml so other researchers can use & cite your work!

https://dramsch.net/articles/euroscipy-2022/euroscipy-tutorial-3-model-sharing/

Good code is easy to use and cite!

Use these libraries:
β€’ flake8 for linting
β€’ black for formatting

Write docstrings for docs!
(VS Code has a fantastic extension called autoDocstring)

Your peers will thank you.

πŸ”¬ Benchmarking

Compare your models using the right metrics and benchmarks.

Here are great examples:
β€’ DummyClassifiers
β€’ Benchmark Datasets
β€’ Domain Methods
β€’ Linear Models
β€’ Random Forests

Always ground your model in the reality of science!

https://dramsch.net/articles/euroscipy-2022/euroscipy-tutorial-2-benchmarking/

Proper benchmarks make stronger papers!
Metrics on their own don't paint a full picture.

Use benchmarks to tell a story of "how well your model should be doing" and disarm many comments by Reviewer 2 before they're written down.