Our paper (with Julie Cartier, Johanna Lagoas, Youmna Ayadi, Adeline Fermanian and @flomass) on the use of statistical knockoffs for the differential analysis of transcriptomics data just came out, very appropriately as it nicely illustrates my point:
https://academic.oup.com/bib/article/27/3/bbag148/8687371

Using simulated outcomes on real transcriptomics data, we've shown that KOs (and in particular, the KOPI approach) do retrieve important variables with better power than classical approaches (Wilcoxon, Lasso), while controlling FDR.

However, all methods perform poorly when the relationship between gene expressions and outcome is nonlinear.

On real outcomes, the method is overly conservative (having no discoveries is a surefire way of controlling your number of false discoveries), and we had to turn the false discovery rate threshold to 50% to select any gene at all.

#machineLearning #genomics #featureSelection #biomarkerDiscovery #transcriptomics

Title: P2: I have been reading about "feature selection" for ML. [2024-10-13 Sun]
- http://www.feat.engineering/feature-selection
#ml #machinelearning #datascience #dailyreport #featureselection
1.4 Feature Selection | Feature Engineering and Selection: A Practical Approach for Predictive Models

A primary goal of predictive modeling is to find a reliable and effective predic- tive relationship between an available set of features and an outcome. This book provides an extensive set of techniques for uncovering effective representations of the features for modeling the outcome and for finding an optimal subset of features to improve a model’s predictive performance.

Title: P1: I have been reading about "feature selection" for ML. [2024-10-13 Sun]
- Non-Supervised - based on correlation of among features
themselves, without target. ex. PCA, t-SNE, Autoencoders,
Independent component analysis (ICA)

Interesting methods are “Stepwise forward/backward
selection”, “Simulated Annealing (SA)” and “Genetic
Algorithms”.

Links:
- Applied Predictive Modeling. Max Kuhn. Kjell
Johnson. Springer. #ml #machinelearning #datascience #dailyreport #featureselection

Title: P0: I have been reading about "feature selection" for ML. [2024-10-13 Sun]
It is eliminating candidate features from dataset.

Main types:
- intrinsic (or implicit) methods
- Filter Methods - before training calc correlation. by
correlation or F-test.
- Wrapper Methods or embedded methods -
ex. forward/backward selection...

May be categorized to:
- supervised - based on correlation to the target #ml #machinelearning #datascience #dailyreport #featureselection

Feature Selection: A Simplified Guide

“Feature selection is a key step in ML: it reduces dimensionality and complexity by keeping only the most informative variables.
It improves accuracy, speeds up training, and enhances interpretability, in both supervised and unsupervised tasks.“

📎https://tiagoribeiro.vercel.app/blog_posts/4_feature_selection.html
#MachineLearning #FeatureSelection #AI

Feature Selection là gì? A-Z về lựa chọn đặc trưng trong ML

Trong lĩnh vực Học máy, Feature Selection đóng vai trò như một bộ lọc thông minh, giúp tinh chỉnh dữ liệu đầu vào cho mô hình. Bằng cách sàng lọc và loại bỏ các đặc trưng ít giá trị hoặc gây nhiễu, phương pháp này trực tiếp góp phần nâng cao hiệu suất và độ tin cậy của mô hình dự đoán. Bài viết sau đây sẽ làm rõ Feature Selection là gì và nêu chi tiết các ưu điểm của nó.

Xem ngay: https://interdata.vn/blog/feature-selection-la-gi/
#interdata #featureselection

⚙️ Hybrid Approach: The Hybrid method combines the strengths of both Filter and Wrapper approaches, offering a balance between speed and accuracy. By using a filter to narrow down the features and a wrapper for fine-tuning, it provides an effective and efficient feature selection process. #DataScience #ML #FeatureSelection #DataTalksClub #zoomcamp
🤖 Ensemble Approach: This technique combines multiple models to select the best features. By using multiple algorithms and aggregating their results, it improves robustness and accuracy. Common methods include Random Forest and Gradient Boosting. #MachineLearning #AI #FeatureSelection #DataTalksClub #zoomcamp
🎯 Wrapper Approach: Unlike the Filter approach, the Wrapper method evaluates feature subsets by training a model. It iteratively adds or removes features to find the optimal set. While more computationally expensive, it tends to provide better results when combined with powerful models. #DataScience #ML #FeatureSelection #DataTalksClub #zoomcamp
🔍 Filter Approach: This technique evaluates the relevance of features based on statistical tests, such as correlation, chi-square, or mutual information. It selects features independently of the learning algorithm, making it fast but sometimes less accurate. #FeatureSelection #DataScience #ML #DataTalksClub #zoomcamp