Mastodawn

Here are the episodes on a chart (hover to see the title). Embeddings -> principal component analysis -> first 2 components (i.e. most significant) plotted. Similar episodes are "nearby". Code provided by OpenAI, I didn't do anything special here

Could this lead somewhere interesting? Thinking...

https://interconnected.org/more/2023/02/in_our_time-PCA-plot.html

Show thread

Brian Suda

@genmon The scatter plot is super interesting. I want to learn more about the outliers! I’ve probably listened to every single episode that’s been podcasted. Beyond a Dewey-Decimal number, could you ask it for top 3-5 tags? Then we could find “Money” topics across economics, society, history, etc.

What IoT topics cut across multiple categories?

Show thread

Matt Webb Feb 7, 2023

@briansuda as it happens I did also request tags! They're unreliable, it turns out -- it seems you need a well-known controlled vocab to pin it down. And GPT is really bad at assigning multiple, different topics to the same episode

Even when they did work, browsing wasn't significantly different from using "Similar episodes"

So I think maybe playing more with embedding space is the way forward. There's a technique called TCAVs I want to try