Please help me, fellow MastoNerds: What is the appropriate statistical distribution for responses to questions in which participants are asked to rank-order groups of stimuli? And how would it be implemented in pymc3/Stan/Turning.jl? Closest thing I can find is the ordered probit model.

Please and thank you 🙏

#statistics #stan #rlang #julialang #Probability

@tmcurley
Let me be sure I'm getting the question right... A person is given say 4 different foods and they are asked to assign the numbers 1,2,3,4 one to each food? Something like that?
@dlakelan yes, that's exactly right! You understand perfectly
@tmcurley
So what you're measuring is a function from 1,2,3,4 to outcomes 1,2,3,4. Any permutation is possible so you have 4! = 24 Of those and you can think of the outcomes as just the numbers 1..24 if you like. Of course if there are more than 4 possibilities the factorial grows rapidly and this doesn't necessarily scale well. Another way to handle this is to consider the function itself. You've got a mass of "like" which we can think of as one unit
@tmcurley
The one unit of like gets allocated among the numbers 1..N for the N options. This may sound familiar it's the same as probability among discrete outcomes, a Dirichlet distribution! I think I would model the underlying like as Dirichlet, and then model the outcomes as either compatible with the Dirichlet or incompatible (basically reject any sample where the observed rankings are incompatible with the underlying Dirichlet preferences)
@tmcurley
This is all based on a few mins of thinking about it so maybe you can start there and see if you get anywhere?
@tmcurley
Yep, I like the Dirichlet model. You could either have the likelihood be 0 for incompatible and 1 for compatible assignments, or if you think there is possibility of "assignment error" you could devise a continuous weighting scheme, say with the sum of squares errors. That will likely sample a lot better.

@tmcurley

Shower thoughts... remember that "compatibility" here is that the order of preference is correct. So if there are 3 options and they rank them 1,3,4,2 then the underlying dirichlet is compatible if q[1] < q[3] < q[4] < q[2] which would yield Likelihood 1 and 0 otherwise.

@dlakelan thanks for the great info! The number of potential permutations could become a bit of a problem as we asked participants to rank-order as many as 10 items in a group, although R and Julia can handle those computations fairly easily. I think the structure you proposed (x1 > ... > xi) will be especially helpful in developing likelihood estimates of rank orders of subsets of elements, like assessing "red > blue" out of a corpus of rank orders of 8 different colors.

@tmcurley

Yes, I would absolutely go with the Dirichlet formulation. It's not clear how you're going to use the model, but if you're looking at measuring what's typical and the variation in preferences, this is probably the way to go. If you're predicting preferences from covariates etc you can consider using your covariate based predictions to influence the parameters of the Dirichlet