The entire BBC In Our Time archive browsable by Dewey-Decimal code? Yes please

I made a website to find old episodes of In Our Time to listen to. There are almost a thousand, it’s my starting point for any new topic

Very early, suggestions welcome

https://genmon.github.io/braggoscope/

Explore the In Our Time archive

Explore the In Our Time archive.

Braggoscope

There's heavy use of GPT-3 in making this work! Both in extracting machine-readable data, and classifying episodes by library code

I feel like this programmatic use of LLMs is where AI gets really interesting

Details on the About page https://genmon.github.io/braggoscope/about

About

Explore the In Our Time archive.

Braggoscope

And some bonus material!

Here are the episodes on a chart (hover to see the title). Embeddings -> principal component analysis -> first 2 components (i.e. most significant) plotted. Similar episodes are "nearby". Code provided by OpenAI, I didn't do anything special here

Could this lead somewhere interesting? Thinking...

https://interconnected.org/more/2023/02/in_our_time-PCA-plot.html

@genmon The scatter plot is super interesting. I want to learn more about the outliers! I’ve probably listened to every single episode that’s been podcasted. Beyond a Dewey-Decimal number, could you ask it for top 3-5 tags? Then we could find “Money” topics across economics, society, history, etc.

What IoT topics cut across multiple categories?

@briansuda as it happens I did also request tags! They're unreliable, it turns out -- it seems you need a well-known controlled vocab to pin it down. And GPT is really bad at assigning multiple, different topics to the same episode

Even when they did work, browsing wasn't significantly different from using "Similar episodes"

So I think maybe playing more with embedding space is the way forward. There's a technique called TCAVs I want to try

@genmon “Braggoscope” is such a great name!
@garrettc hahaha it was my placeholder name when I started this last week but I think it might stick
@genmon this is fantastic! Is the metadata manually curated?

@jamesking no! that's the amazing thing -- there's tons of automation, really only possible because of GPT-3 as a web scraping and categorising tool

Details on the About page https://genmon.github.io/braggoscope/about

About

Explore the In Our Time archive.

Braggoscope
@genmon @jamesking How do you evaluate the goodness-of-fit of the DDC terms?

@thatandromeda @jamesking eyeball. There seem to be a few arguable placements, and one out-and-out GPT misfire that I've spotted so far (Lawrence of Arabia under History of the Ancient World)

A big problem with this technique is it's not very tuneable. So I'm looking for alternatives (still with automation)... there's a technique called TCAVs which is interesting (proximity in embedding space) but some digging required there

PodSearch Reborn - David Smith, Independent iOS Developer

@tomhannen tempted, tempted! I think it would have to be some kind of "Look Inside" style experience -- doing anything more feels like it would be crossing a line. But maybe fine-tuning an agent so I can have a _conversation_ about the topic...!
@genmon wow it’s a good reminder of how, er, unbalanced Dewey Decimal is. The subdivisions of 200 are wild. 😮
@frankieroberto yeah I made a lot of :/ faces going through that list too. It’s nicely sized for human browsing but, uh, really not great
@genmon there’s a great podcast episode about the history of alphabetical order which includes a bit about Dewey at the end: https://99percentinvisible.org/episode/alphabetical-order/transcript/
Alphabetical Order - 99% Invisible

During the parade of nations at the 2008 Beijing Summer Olympics, Greece’s athletes entered the stadium first, as per a long-standing tradition. But instead of following in alphabetical order, other countries came out in a sequence corresponding to the number of strokes each nation’s name had in Chinese characters. Jamaica, for example, was followed by

99% Invisible
@frankieroberto holy moly. Well that explains why the hierarchy is so skewed, what an unpleasant individual
@genmon thanks for sharing this gem

@genmon I hope that's not the only way of navigating through this data.

#Dewey is really a very bad idea in general and this dirty workaround should be replaced by more practical alterantives as soon as possible.

I really don't get it why so many people still use #DCC as hierarchy. Its structure is a frozen state of a hundred year-old bias which doesn't reflect our reality any more for many decades.

https://en.wikipedia.org/wiki/Dewey_Decimal_Classification#Influence_and_criticism

https://karl-voit.at/2017/04/18/classification/

#DeweyDecimalClassification

Dewey Decimal Classification - Wikipedia

@publicvoit @genmon As a fan of In Our Time and a library assistant when I was younger, this is just perfect.

@G0OXO @genmon Yes, I know that we've got many old-school systems that are so deep down in DCC that it's hard to migrate to a better concept. However, that doesn't make DCC any better. It's an old dinosaur that refuses to get extinct. 😔

On https://www.reddit.com/r/datacurator/ there is a fanbase of DCC, applying it even for computer file management. 🤦‍♂️ 🤷

Data Curator • r/datacurator

A place for us less messy data hoarders.

reddit
@genmon brilliant and very useful! Also, I noticed that there are no episodes on Technology :-(
@marcsteen I think partially because of the focus on history, and partially because some episodes may be miscategorised… I will have to poke around
@genmon It's like TED talks for the grown-ups! Thank you for this.
@genmon A great idea bookmarked for a browse later.
@genmon So far today I’ve learned about Mithraism, Druids, Angkor Wat and John Bull…
@lucy_who that’s what happens!
Qualms about Melvil Dewey notwithstanding (and acknowledged by @genmon), this is a fascinating project both as alternative way to navigate In Our Time archives but an example of using GPT to find the structure in data
@genmon @feelinglistless was it you doing something similar, Stuart, or someone else?
@magslhalliday @feelinglistless somebody shared your catalogue with me, which I hadn’t seen ahead of my side project — considerably higher quality than what GPT can do!! https://feelinglistless.blogspot.com/2022/02/cataloguing-bbc-radio-4s-in-our-time.html
Cataloguing BBC Radio 4's In Our Time using Dewey Decimal Classification.

Radio   In Our Time is a weekly live BBC radio discussion programme in which the broadcaster Melvyn Bragg and several academics elucidate ...

@genmon @magslhalliday Sorry only just seen this. Were you trying to DDC something using ChatGPT? As a test I've just asked it to classify the next set of IOT titles and so far it seems scarily accurate.
@genmon @magslhalliday That said when I asked it to just pull the raw list and classify it, instead it created a list of fictional list of episodes all of which sounded perfectly plausible. LOL.

@feelinglistless @magslhalliday yeah here’s the directory!

https://genmon.github.io/braggoscope/directory

(Check the About page for how it works)

Directory

Explore the In Our Time archive.

Braggoscope
@genmon @magslhalliday Just found yours too now. I tried get CGPT to generate a version of my list and it still included episodes which don't exist. I guess you get around that because you're pulling the data directly from the BBC website.