Mastodawn

Terence Eden 10h ago

I think I have a genuine need for an #LLM. Can someone tell me if this is possible?

@openbenches contains ~40k text inscriptions.

Someone wants to know how many are dedicated to men, how many to women.

"To Grandma Sylvia" is obvious.
"To R Smith" is not.

Could an AI give a rough estimate of the gender of a subject?

Could it ignore text relating to who the inscription is from? "To Granny from Dave and Alice".

What would be the most accurate / cheapest / fastest / easiest tool to work with?

Show thread

sauc3 10h ago

@Edent @openbenches

Others have pointed out too, LLMs are something I would not trust for the job. Since hallucinations are inevitable, that tells me it is the wrong tool for the job, unless you don't care about the accuracy of the results.

Someone else posted some code that will do this. I think regex or some *NIX command line tools could also accomplish this task.

But if accuracy if what you are looking for, then large language models are not the tool for the job.

Show thread

Terence Eden 10h ago

@sauc3 I think you're misunderstanding me. I want to use the LLM to extract the names. Sure, it might hallucinate "Mr In Loving Memory" - but I consider that unlikely.

I simply can't see how a regex would work on such massively unstructured data.

Show thread

sauc3

@Edent

I am not a regex guru, and the ethical downside of using large language models outweighs any other considerations for me.

But I guess if you don't care about ethics, do whatever you want.

Show thread

Terence Eden 10h ago

@sauc3 I'm very happy to train my own classifier for this. Any suggestions on where to start?

@sauc3 oh, sorry, I thought you had some knowledge of the field?