Nicholas Guttenberg

289 Followers
92 Following
323 Posts
Physicist/ML researcher creating artificial universes.
materials also have "grains" and correspondingly we can define "grain spaces" (a term i get from @lea), abstractions that are compatible with the natural grain of a material but that reduce the design space to a set of choice combinations with desired properties (example: sound waves are a material; a musical harmony scheme is a grain space)

Our new paper (led by newly minted graduate, Florian Lalande) on using machine learning to impute unknown exoplanet properties has just dropped!

Let's talk deets...

We're edging towards a stunning 6,000 exoplanet discoveries. It's a huge database that should be outright PLUNDERED for trends that can reveal how planets form and evolve. However, use is throttled by the fact that database has so many missing properties it's got more holes than a sponge. (cont...)

📘 : https://astro.theoj.org/article/124538-estimating-exoplanet-mass-using-machine-learning-on-incomplete-datasets

Estimating Exoplanet Mass using Machine Learning on Incomplete Datasets | Published in The Open Journal of Astrophysics

By Florian Lalande, Elizabeth Tasker & 1 more. A discussion of different methods for inferring exoplanet masses in catalogues with missing data

This is an app which people use to not get lost when hiking back-country, out of cellphone coverage.

That can spontaneously decide 'oh, you haven't called home recently, lets lock out the user until they get back online'.

Meaning that you might think you're fine - check that the app is working, start recording, etc - just to find that 4 hours into the wilderness your map has basically deleted itself.

That is NOT OKAY.

Annoyed at Gaia GPS but I realize I should actually be furious. I was recording a track to find my way back on mountain roads and, once I was out of cellphone coverage it locked me out of the app (while it was actually up and recording no less!) unless I went online and made an account with their new owner Outside.

When I tried to find the option to just use it without an account, it deleted all my local data.

In the end I was fine and didn't need it. But it could have been quite serious.

At my old job, I was free to choose which license to put on my code.

At first I just used MIT, but later was drawn to public domain licenses like The Unlicense.

Now that I'm no longer employed there, the code that's public domain *feels* better. The MIT stuff is tainted with acrimony. It's not my code. It belongs to that company. But the public domain code belongs to everyone.

I'd encourage you to dedicate your work to the public domain whenever you can. I think it's a radical move.

The interesting thing though is that maybe you can actually do the 'normalizing flows' bit (which is really expensive in terms of architectural depth and limits) after training a non-invertible autoregressive model, just focusing on the 'flow' bit of the architecture converting sampling from an arbitrary p(x_n|x_1..x_{n-1}) into a deterministic function of a sampled variable z, without having to do the entire logic of the model with invertible layers. Maybe?
So the thing I need to figure out is, can you structure things in order to *guarantee* that z absorbs all of the entropy? Well if you do that, I think that just ends up re-inventing normalizing flows.
What if you do tricks like introducing some hidden variable? Lets say normally my autoregressive model learns p(x_n | x_1, ..., x_{n-1}) but instead I add a randomly sampled extra vector z, such that the thing in my loss is the integral [p(z)p(x_n | x_1, ..., x_{n-1}, z) dz], so 'z' becomes something like a seed? Well, z can absorb some of the entropy generation of the individual token sampling events, but unless it absorbs *all* of it, you still ultimately diverge.

So the consequence is if for example you're generating a distribution of sentences autoregressively with an LLM and you forcibly 'fix' a particular token (as opposed to achieving it by filtering), the new distribution of sentences increasingly diverges from the old distribution in the 'forward' direction of your autoregressive model.

This is why in diffusion models you can change the prompt but keep the 'seed' and get a related image, but not for LLMs.