Our new paper (led by newly minted graduate, Florian Lalande) on using machine learning to impute unknown exoplanet properties has just dropped!
Let's talk deets...
We're edging towards a stunning 6,000 exoplanet discoveries. It's a huge database that should be outright PLUNDERED for trends that can reveal how planets form and evolve. However, use is throttled by the fact that database has so many missing properties it's got more holes than a sponge. (cont...)
This is an app which people use to not get lost when hiking back-country, out of cellphone coverage.
That can spontaneously decide 'oh, you haven't called home recently, lets lock out the user until they get back online'.
Meaning that you might think you're fine - check that the app is working, start recording, etc - just to find that 4 hours into the wilderness your map has basically deleted itself.
That is NOT OKAY.
Annoyed at Gaia GPS but I realize I should actually be furious. I was recording a track to find my way back on mountain roads and, once I was out of cellphone coverage it locked me out of the app (while it was actually up and recording no less!) unless I went online and made an account with their new owner Outside.
When I tried to find the option to just use it without an account, it deleted all my local data.
In the end I was fine and didn't need it. But it could have been quite serious.
At my old job, I was free to choose which license to put on my code.
At first I just used MIT, but later was drawn to public domain licenses like The Unlicense.
Now that I'm no longer employed there, the code that's public domain *feels* better. The MIT stuff is tainted with acrimony. It's not my code. It belongs to that company. But the public domain code belongs to everyone.
I'd encourage you to dedicate your work to the public domain whenever you can. I think it's a radical move.
So the consequence is if for example you're generating a distribution of sentences autoregressively with an LLM and you forcibly 'fix' a particular token (as opposed to achieving it by filtering), the new distribution of sentences increasingly diverges from the old distribution in the 'forward' direction of your autoregressive model.
This is why in diffusion models you can change the prompt but keep the 'seed' and get a related image, but not for LLMs.