Looking for advice on FDR thresholds in #GSEA: Do you typically use FDR < 0.25 or FDR < 0.05 to determine significance? How does your threshold choice change depending on permutation_type='phenotype' vs. permutation_type='gene_set'? #bioinformatics #rnaseq
@victorjavierlo 0.05 but this is (theoretically) decided before seeing results but doesn't take into consideration the biology meaning/relevance. (You probably knew that already)
@victorjavierlo I typically do gene set permutation with fgsea, which I think does a more strict padj calculation, but regardless, either 0.05 or 0.1 if it's more exploratory. Also, I typically select certain gene sets or categories for the testing.
@keyboardpipette GSEA webpage: An FDR of 25% indicates that the result is likely to be valid 3 out of 4 times, which is reasonable for exploratory results where one is interested in finding candidate hypothesis to be further validated as a results of future research. Given the lack of coherence in most datasets and the relatively small number of gene sets being analyzed, using a more stringent FDR cutoff may lead you to overlook potentially significant results. . https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#Why_does_GSEA_use_a_false_discovery_rate_.28FDR.29_of_0.25_rather_than_the_more_classic_0.05.3F
FAQ - GeneSetEnrichmentAnalysisWiki

@keyboardpipette The GSEA web page is not very clear about it. I also read this in a forum: GSEA recommends a FDR threshold of 0.25 when running in the Phenotype permutation mode. Generally with the gene set mode it is better to use a threshold of 0.05 just based on the nature of the test being performed. https://groups.google.com/g/gsea-help/c/0RIVtr7aESQ
Interpreting the FDR q value

Phenotype permutation vs gene-set permutation

@victorjavierlo @keyboardpipette Mmh, I am not sure of this reasoning. Not saying either that 0.05 is better. Why don't use 0.25 alpha value in any other hypothesis testing? I think a hard cut in any threshold might miss relevant results.
But I agree on subsetting and carefully selecting the identifiers and database for this test.
Some time ago, I tried to move this area to a more bayesian view but I didn't manage to compute a representative sample of the possible gene sets/pathways.