@osma At #SWIB23 we discussed with @SemAntiKast about some tests you did with LLM based classification, if I remember correctly. Do I remember correctly? If yes, did you document your tests, and did you try different approaches (reasoning models etc.)?

@hauschke @SemAntiKast Hmm, now I'm not sure what discussion you're referring to.

We've mainly tried LLMs for metadata extraction, not classification as such. I gave a talk about it at #SWIB24 with Pierre Beauguitte from Norway (and also a lightning talk at #SWIB23 ).

Code for the experiments can be found here: https://github.com/NatLibFi/FinGreyLit

Most experiments involve fine-tuning smallish LLMs. Haven't tried reasoning models yet.

GitHub - NatLibFi/FinGreyLit: Data set of Finnish grey literature, containing curated Dublin Core style metadata and links to original PDF publications

Data set of Finnish grey literature, containing curated Dublin Core style metadata and links to original PDF publications - NatLibFi/FinGreyLit

GitHub