Gotta say, as much as I've been disappointed in no progress on the quality frontier, gpt-4o-mini has actually meaningfully improved my use cases, like using it for fast/cheap rejection sampling across large synthetic datasets.
@ericflo what is "fast/cheap rejection sampling across large synthetic datasets."?
@schizanon You generate a lot of data using one LLM, and then before training on it, you use another LLM to get rid of the worse stuff. I use Llama for the datagen, and gpt-4o-mini to reject the worse half.

@ericflo got it, thanks!

OpenAI does something like that with #CriticGPT right?

What purpose does one "generate a lot of data" for?

@schizanon They're all doing it. Rejection sampling is pretty much the first thing you do before moving forward with RLHF-style contrastive learning like PPO/DPO/KTO/RLOO etc. You do it when you want to get better at some task(s), so you want more training data to elicit that behavior and then demonstrate it, and use rejection to move the distribution in the direction you want
@ericflo it's counter intuitive to me that you can generate training data and get something useful from it. Aren't you just training your model to generate what your generator already generated?
@schizanon That's what the rejection sampling does, shifts the distribution. The circuits are there, but minimized and maladaptive. Think about it like, if it gets it right 1 of every 100 times, then you need to maximize the probability of that 1