Mastodawn

Here are the episodes on a chart (hover to see the title). Embeddings -> principal component analysis -> first 2 components (i.e. most significant) plotted. Similar episodes are "nearby". Code provided by OpenAI, I didn't do anything special here

Could this lead somewhere interesting? Thinking...

https://interconnected.org/more/2023/02/in_our_time-PCA-plot.html

Show thread

Brian Suda

Feb 7, 2023

@genmon The scatter plot is super interesting. I want to learn more about the outliers! I’ve probably listened to every single episode that’s been podcasted. Beyond a Dewey-Decimal number, could you ask it for top 3-5 tags? Then we could find “Money” topics across economics, society, history, etc.

What IoT topics cut across multiple categories?

Show thread

Matt Webb Feb 7, 2023

@briansuda as it happens I did also request tags! They're unreliable, it turns out -- it seems you need a well-known controlled vocab to pin it down. And GPT is really bad at assigning multiple, different topics to the same episode

Even when they did work, browsing wasn't significantly different from using "Similar episodes"

So I think maybe playing more with embedding space is the way forward. There's a technique called TCAVs I want to try

Show thread

garrettc Feb 7, 2023

@genmon “Braggoscope” is such a great name!

Show thread

Matt Webb Feb 7, 2023

@garrettc hahaha it was my placeholder name when I started this last week but I think it might stick

Show thread

James King Feb 7, 2023

@genmon this is fantastic! Is the metadata manually curated?

Show thread

Matt Webb Feb 7, 2023

@jamesking no! that's the amazing thing -- there's tons of automation, really only possible because of GPT-3 as a web scraping and categorising tool

Details on the About page https://genmon.github.io/braggoscope/about

About

Explore the In Our Time archive.

Braggoscope

Show thread

James King Feb 7, 2023

@genmon bravo!

Show thread

Andromeda Yelton Feb 15, 2023

@genmon @jamesking How do you evaluate the goodness-of-fit of the DDC terms?

Show thread

Matt Webb Feb 15, 2023

@thatandromeda @jamesking eyeball. There seem to be a few arguable placements, and one out-and-out GPT misfire that I've spotted so far (Lawrence of Arabia under History of the Ancient World)

A big problem with this technique is it's not very tuneable. So I'm looking for alternatives (still with automation)... there's a technique called TCAVs which is interesting (proximity in embedding space) but some digging required there

Show thread

Tom Hannen Feb 7, 2023

@genmon Full transcripts using Whisper! Like this: https://www.david-smith.org/blog/2023/02/02/podsearch-reborn/

PodSearch Reborn - David Smith, Independent iOS Developer

Show thread

Matt Webb Feb 7, 2023

@tomhannen tempted, tempted! I think it would have to be some kind of "Look Inside" style experience -- doing anything more feels like it would be crossing a line. But maybe fine-tuning an agent so I can have a _conversation_ about the topic...!

Show thread

Frankie Roberto Feb 7, 2023

@genmon wow it’s a good reminder of how, er, unbalanced Dewey Decimal is. The subdivisions of 200 are wild. 😮

Show thread

Matt Webb Feb 7, 2023

@frankieroberto yeah I made a lot of :/ faces going through that list too. It’s nicely sized for human browsing but, uh, really not great

Show thread

Frankie Roberto Feb 7, 2023

@genmon there’s a great podcast episode about the history of alphabetical order which includes a bit about Dewey at the end: https://99percentinvisible.org/episode/alphabetical-order/transcript/

Alphabetical Order - 99% Invisible

During the parade of nations at the 2008 Beijing Summer Olympics, Greece’s athletes entered the stadium first, as per a long-standing tradition. But instead of following in alphabetical order, other countries came out in a sequence corresponding to the number of strokes each nation’s name had in Chinese characters. Jamaica, for example, was followed by

99% Invisible

Show thread

Matt Webb Feb 7, 2023

@frankieroberto holy moly. Well that explains why the hierarchy is so skewed, what an unpleasant individual

Show thread

legocas Feb 7, 2023

@genmon thanks for sharing this gem

Show thread

Stephen Webb has moved Feb 7, 2023

@genmon Brilliant - thank you!

Show thread

Karl Voit

Feb 7, 2023

@genmon I hope that's not the only way of navigating through this data.

#Dewey is really a very bad idea in general and this dirty workaround should be replaced by more practical alterantives as soon as possible.

I really don't get it why so many people still use #DCC as hierarchy. Its structure is a frozen state of a hundred year-old bias which doesn't reflect our reality any more for many decades.

https://en.wikipedia.org/wiki/Dewey_Decimal_Classification#Influence_and_criticism

https://karl-voit.at/2017/04/18/classification/

#DeweyDecimalClassification

Dewey Decimal Classification - Wikipedia

Show thread

G0OXO(he/him)Feb 7, 2023

@publicvoit @genmon As a fan of In Our Time and a library assistant when I was younger, this is just perfect.

Show thread

Karl Voit

Feb 7, 2023

@G0OXO @genmon Yes, I know that we've got many old-school systems that are so deep down in DCC that it's hard to migrate to a better concept. However, that doesn't make DCC any better. It's an old dinosaur that refuses to get extinct. 😔

On https://www.reddit.com/r/datacurator/ there is a fanbase of DCC, applying it even for computer file management. 🤦‍♂️ 🤷

Data Curator • r/datacurator

A place for us less messy data hoarders.

Show thread

Marc Steen 🌿Feb 7, 2023

@genmon brilliant and very useful! Also, I noticed that there are no episodes on Technology :-(

Show thread

Matt Webb Feb 7, 2023

@marcsteen I think partially because of the focus on history, and partially because some episodes may be miscategorised… I will have to poke around

Show thread

Alasdair Feb 7, 2023

@genmon Amazing!

Show thread

@ChrisCorrigan has moved Feb 7, 2023

@genmon It's like TED talks for the grown-ups! Thank you for this.

Show thread

historygirllfc Feb 7, 2023

@genmon A great idea bookmarked for a browse later.

Show thread

lucy_who Feb 7, 2023

@genmon So far today I’ve learned about Mithraism, Druids, Angkor Wat and John Bull…

Show thread

Matt Webb Feb 8, 2023

@lucy_who that’s what happens!

Show thread

Blake Eskin Feb 13, 2023

Qualms about Melvil Dewey notwithstanding (and acknowledged by @genmon), this is a fascinating project both as alternative way to navigate In Our Time archives but an example of using GPT to find the structure in data

Show thread

MagsLHalliday Mar 12, 2023

@genmon @feelinglistless was it you doing something similar, Stuart, or someone else?

Show thread

Matt Webb Mar 13, 2023

@magslhalliday @feelinglistless somebody shared your catalogue with me, which I hadn’t seen ahead of my side project — considerably higher quality than what GPT can do!! https://feelinglistless.blogspot.com/2022/02/cataloguing-bbc-radio-4s-in-our-time.html

Cataloguing BBC Radio 4's In Our Time using Dewey Decimal Classification.

Radio In Our Time is a weekly live BBC radio discussion programme in which the broadcaster Melvyn Bragg and several academics elucidate ...

Show thread

Stuart Ian Burns Mar 14, 2023

@genmon @magslhalliday Sorry only just seen this. Were you trying to DDC something using ChatGPT? As a test I've just asked it to classify the next set of IOT titles and so far it seems scarily accurate.

Show thread

Stuart Ian Burns Mar 14, 2023

@genmon @magslhalliday That said when I asked it to just pull the raw list and classify it, instead it created a list of fictional list of episodes all of which sounded perfectly plausible. LOL.

Show thread

Matt Webb Mar 14, 2023

@feelinglistless @magslhalliday yeah here’s the directory!

https://genmon.github.io/braggoscope/directory

(Check the About page for how it works)

Explore the In Our Time archive

About

About

PodSearch Reborn - David Smith, Independent iOS Developer

Alphabetical Order - 99% Invisible

Dewey Decimal Classification - Wikipedia

Data Curator • r/datacurator

Cataloguing BBC Radio 4's In Our Time using Dewey Decimal Classification.

Directory