The entire BBC In Our Time archive browsable by Dewey-Decimal code? Yes please

I made a website to find old episodes of In Our Time to listen to. There are almost a thousand, it’s my starting point for any new topic

Very early, suggestions welcome

https://genmon.github.io/braggoscope/

Explore the In Our Time archive

Explore the In Our Time archive.

Braggoscope

There's heavy use of GPT-3 in making this work! Both in extracting machine-readable data, and classifying episodes by library code

I feel like this programmatic use of LLMs is where AI gets really interesting

Details on the About page https://genmon.github.io/braggoscope/about

About

Explore the In Our Time archive.

Braggoscope

And some bonus material!

Here are the episodes on a chart (hover to see the title). Embeddings -> principal component analysis -> first 2 components (i.e. most significant) plotted. Similar episodes are "nearby". Code provided by OpenAI, I didn't do anything special here

Could this lead somewhere interesting? Thinking...

https://interconnected.org/more/2023/02/in_our_time-PCA-plot.html

@genmon The scatter plot is super interesting. I want to learn more about the outliers! I’ve probably listened to every single episode that’s been podcasted. Beyond a Dewey-Decimal number, could you ask it for top 3-5 tags? Then we could find “Money” topics across economics, society, history, etc.

What IoT topics cut across multiple categories?

@briansuda as it happens I did also request tags! They're unreliable, it turns out -- it seems you need a well-known controlled vocab to pin it down. And GPT is really bad at assigning multiple, different topics to the same episode

Even when they did work, browsing wasn't significantly different from using "Similar episodes"

So I think maybe playing more with embedding space is the way forward. There's a technique called TCAVs I want to try