I think I have a genuine need for an #LLM. Can someone tell me if this is possible?

@openbenches contains ~40k text inscriptions.

Someone wants to know how many are dedicated to men, how many to women.

"To Grandma Sylvia" is obvious.
"To R Smith" is not.

Could an AI give a rough estimate of the gender of a subject?

Could it ignore text relating to who the inscription is from? "To Granny from Dave and Alice".

What would be the most accurate / cheapest / fastest / easiest tool to work with?

@Edent @openbenches

Not in any way I'd trust. LLMs tend to have huge gender biases that have, as of yet, gone unaddressed.

They are machines designed to make up plausible sounding responses that are difficult to concretely prove or falsify.

I don't recommend it.

@ajroach42 I guess that's part of the thing I'm trying to understand.

I can see how a biased system might insist that all "Sam"s are default male - or might not understand non-English languages.

But if I can see the assumptions that it makes, that could be very helpful.

@Edent

Generally speaking, we don't have any insight into the specifics. These machines are essentially black boxes.

What we know about the things they get wrong come from people who are studying the outputs.

But the outputs are inconsistent.

@ajroach42 Yes, it's the outputs I'd like to study.

E.g.
Sylvia - Female 100% confidence
Joe - Male 100%
Jo - Female 70%
Sam - Male 80%
etc.

I appreciate it might be different each run, but would allow me to see how dodgy it was.

@Edent

I haven't done that kind of testing personally, but I know there are some python utilities designed to do that kind of bias testing.

i haven't done much ML work since OpenAI hit the scene, so the specific analytical stuff I've done is very dated.

I've mostly stepped back and depended on studies others are posting.

@Edent @openbenches Not able to send an in-depth reply now, but this is something in my wheelhouse. I’ve worked on a similar project before (though at a smaller scale) and would be happy to offer my knowledge, test viability, and help you implement it if it looks promising.
@Edent @openbenches We wrote some code for that: https://github.com/mysociety/gender-detect , could add some regex on top to spot if it’s from/to?
We’ve used llm to create regex: https://www.mysociety.org/2025/08/12/using-llms-to-write-categorisation-rules/
But I wouldn’t trust it to do the actual thing.
GitHub - mysociety/gender-detect: Python package to guess gender from name

Python package to guess gender from name. Contribute to mysociety/gender-detect development by creating an account on GitHub.

GitHub
@dracos that is detailed and useful! Cheers.
@Edent There's technology for this that predates LLMs by many years; here's an evaluation and discussion of some options: https://doi.org/10.7717/peerj-cs.156
Comparison and benchmark of name-to-gender inference services

The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.

PeerJ Computer Science

@benjamingeer thanks - that's useful!

Sadly it doesn't help with the extracting of names from unstructured data.

@Edent @openbenches

Others have pointed out too, LLMs are something I would not trust for the job. Since hallucinations are inevitable, that tells me it is the wrong tool for the job, unless you don't care about the accuracy of the results.

Someone else posted some code that will do this. I think regex or some *NIX command line tools could also accomplish this task.

But if accuracy if what you are looking for, then large language models are not the tool for the job.

@sauc3 I think you're misunderstanding me. I want to use the LLM to extract the names. Sure, it might hallucinate "Mr In Loving Memory" - but I consider that unlikely.

I simply can't see how a regex would work on such massively unstructured data.

@Edent

I am not a regex guru, and the ethical downside of using large language models outweighs any other considerations for me.

But I guess if you don't care about ethics, do whatever you want.

@sauc3 I'm very happy to train my own classifier for this. Any suggestions on where to start?
@sauc3 oh, sorry, I thought you had some knowledge of the field?

@Edent If the inscriptions are in string form, not image form, this should be trivial, depending on what your acceptable error rate is, and what's your acceptable balance between not providing a response if unsure vs providing an erroneous response.

For 40k text inscriptions, you can even use <1B parameter models, here's a test with facebook's bart zero-shot classifier (which is ancient by today's LLM standards)

EDIT: model link https://huggingface.co/facebook/bart-large-mnli

@openbenches

@budududuroiu Yes, that's part of it. Could it also be used to extract a "to" and "from" classification?

@Edent definitely, many modern LLMs are trained (and then constrained during inference) to be able to produce structured JSON output, so you can just prompt an LLM like "tell me the gender of this person who the plaque is dedicated to, and, if available, who dedicated it to them", and then provide a JSON schema you want in return.

This webpage is a 10,000ft view of Structured Output for LLMs https://openrouter.ai/docs/guides/features/structured-outputs

Structured Outputs - Type-Safe JSON Responses from AI Models

Enforce JSON Schema validation on AI model responses. Get consistent, type-safe outputs and avoid parsing errors with OpenRouter's structured output feature.

OpenRouter Documentation
@budududuroiu Interesting, thanks!
@Edent Yup, despite it probably being able to fit newer model context size, I'd suggest calling an LLM separately for each of the plaque texts.

@Edent are you sure that you need LLM?

If I were to find a solution I'd probably look at https://github.com/ajdavis/proporti.onl and expanded it with ML model. LLM looks a bit too strong for this.

GitHub - ajdavis/proporti.onl: Compare number of women, men, and nonbinary people among my friends and followers.

Compare number of women, men, and nonbinary people among my friends and followers. - ajdavis/proporti.onl

GitHub

@xgebi I think I need the LLM for extracting the names from unstructured data.

The classifier looks like it can be done with other things.

@Edent @openbenches You could probably use something like a BERT zero shot classifier to do this with much less resource usage than a LLM.
@JadedBlueEyes Hit me up with a link / tutorial?

@Edent This seems like a reasonably good example of label-based classification: https://huggingface.co/blog/Ihor/refreshing-zero-shot-classification

Here's another example: https://jaketae.github.io/study/zero-shot-classification/

There are a variety of models, usually BERT (and derivative) models are the easiest to play with. Keywords you probably want to look for are NLI, Zero shot or one shot, entailment.

Refreshing zero-shot classification with ModernBERT

A Blog post by Stepanov on Hugging Face

@Edent @openbenches I would give OpenRouter (where you can easily use different models) a try. With $5 credit you can do a lot. Every request costs 0,0001$ or similar (depending on the model). It’s quite easy to integrate with (same OpenAI API). I use it daily to automatically generate alt text, description and hashtags from images (when I automatically post to socials from @bookcorners )

Models I recommend: gpt-5.4-nano or gpt-5.4-mini

@andreagrandi Cheers, I'll take a look.
@Edent @openbenches AI isn't magic. What is it supposed to do with "R Smith" that I can't? Are you looking to try to research the family that sponsored the bench?

@aburka Yes, that's the point.
I want it to classify where there is confidence, and ignore where there is not.

Someone asked me if there were more benches dedicated to men or to women. That's all.

@Edent there are tools for this long before llms, like https://github.com/miriamposner/derive_gender/blob/master/derive-gender-from-a-column-of-first-names.md question is, how big can you margin of error be. If you want to know if it 70:30 it could work. If you want to know more exact numbers, it'll be difficult. There are papers looking into this before llms. https://www.medrxiv.org/content/10.1101/2024.01.30.24302027v1.full I didn't look for anything LLM related, because the underlaying issue of name/gender ambiguity exists no matter the technology. Tell me the gender of Jay, Andrea, Ridley, Robin etc
@dahie thanks - that's a good starting point.
@Edent sidestepping the question a little bit, given you’re not aiming for complete accuracy, could you randomly sample 100 or so and work out the proportion manually? A bit tedious I’m sure but maybe no more so than trying to train your own model. I’d be happy to do a few if you wanted to outsource the work.
@bencc That might be one approach. Given the geographic distribution, I'm not sure how feasible it is.
You're very welcome to give it a go if you want?
No pressure though!

@Edent Well I had a quick go, 20 selected at random broke down as:

both: 4
f: 5
m: 9
n/a: 2

A second batch of 20 to compare broke down as:

both: 4
f: 5
m: 8
n/a: 2
unknown: 1 (ashley)

Surprisingly consistent (and corrected for the missing one). Be interested how those compare with a bigger analysis.

@Edent @openbenches Oh. This is tickling something in the brain pan about trying to determine if a company name is a sole trader or not from a couple of years ago. IIRC I tried to do it with plain old NLTK, which incorporated tokenising and tagging part of speech, including proper name extraction. That might get you so far. Then other techniques for guesstimating gender. I’ll see if the code is in anyway useable! In fact I’ll first see if I can find the code…
@Edent @openbenches Haven’t found my code, but pretty sure this was the sort of code I was using… https://www.geeksforgeeks.org/nlp/nlp-extracting-named-entities/
NLP | Extracting Named Entities - GeeksforGeeks

Your All-in-One Learning Portal: GeeksforGeeks is a comprehensive educational platform that empowers learners across domains-spanning computer science and programming, school education, upskilling, commerce, software tools, competitive exams, and more.

GeeksforGeeks

@gilesdring @Edent I did something similar with the charity register a while ago: https://dkane.net/2018/names-shared-by-genders/

The list of charity trustees has (or used to at least) some instances with an unambiguous title which I used to extract a list of names .

Names shared by genders

David Kane