Mastodawn

Submission deadline for the *Workshop on Citation Extraction and Parsing (CiteX 2026)* at DIPF Frankfurt has been moved to 1 February.

CiteX 2026 offers an interdisciplinary forum on automated citation extraction and parsing.

We invite submissions of extended abstracts (1250–1500 words) for presentations, posters, or hands-on sessions.
Submission via our website: https://sites.google.com/view/workshop-on-citation-extractio/startseite

• Submission deadline: 01 February 2026
• Notification of acceptance: 1 March 2026
• Camera-ready version: 31 March 2026

Topics of interest include (but are not limited to):
• Automated extraction and parsing of references
• Creation and sharing of gold standards and test datasets
• Standardization and interoperability of citation data
• QA and validation of extracted references
• Comparison of LLM-based and tool-based extraction pipelines

#ReferenceExtraction #CfP #Frankfurt

CiteX 2026

About the Workshop

Show thread

Christian Boulanger Sep 16, 2024

@osma @storytracer Hi-just found this old thread - we're just working on a #referenceextraction & #evaluation workflow involving #LLMs to measure their performance using a hand-annotated dataset of older scholarly articles with #footnotes . Untrained #GROBID performs very badly but that does not mean that it will when properly trained with a good dataset.

Christian Boulanger Sep 16, 2024

Do you want to run the #GROBID PDF-to-#TEI conversion library/server with #Apptainer, for example for #ReferenceExtraction? There was a problem converting the #Docker image, but here's how to solve the problem: https://github.com/kermitt2/grobid/issues/1150#issuecomment-2350942263

Apptainer Support? `stat ~/grobid-service/bin/grobid-service: no such file or directory` · Issue #1150 · kermitt2/grobid

I'm attempting to run Grobid in a HPC (high performance compute) environment, they only support Apptainer. $ apptainer pull docker://grobid/grobid:0.8.0 # ✅ -- creates grobid_0.8.0.sif $ apptainer ...

GitHub

Andreas Wagner May 2, 2023

(Hybrid) Workshop: Extracting Heterogeneous Reference Data, 15/16 May 2023, #mpilhlt Frankfurt/M., Germany.

Registration is open, programme is online: https://mpilhlt.github.io/reference-extraction/workshop-2023/programme/

Interested in extracting literature references from historical texts, scholarly literature in the humanities, documents in low-resourced languages? Want to see how CRF-based approaches compare to LLM ones? Want to make sure the challenges you are struggling with are on developers' roadmaps? Want to learn about some use cases?

Then please join us in the workshop.

#DigitalHumanities #NLP #NaturalLanguageProcessing #ReferenceExtraction #LLM #Bibliometrics

Programme - New Approaches for Extracting Heterogeneous Reference Data

Andreas Wagner Jan 24, 2023

Call for Participations: (Hybrid) Workshop on Extracting Heterogeneous Reference Data, 15/16 May 2023, #mpilhlt Frankfurt/M., Germany.

Interested in extracting literature references from historical texts, scholarly literature in the humanities, documents in low-resourced languages? Want to apply your language model to a new use case and enjoy the gratitude of dozens of humanities, law and social sciences scholars? Have a use case or training data?
Please have a look at our CfP:

https://mpilhlt.github.io/reference-extraction/workshop-2023/cfp

#DigitalHumanities #NLP #NaturalLanguageProcessing #ReferenceExtraction #LLM #Bibliometrics

Call for Papers - New Approaches for Extracting Heterogeneous Reference Data

Andreas Wagner Nov 17, 2022

On the other site, I tried to give a glimpse into what #DigitalHumanitites people do all day from time to time, under the hashtag #DHFromScratch. Will do the same here. Again, it's lots of planning, discussion and counseling, a bit of doing presentations and another bit of actual coding:

Today, there's a workshop coming up about a #SKOS hosting software called #SkoHub, in which I'll do a bit about #ReconciliationAPI, an implementation of which I've been working on (#NodeJS, #ExpressJS, #ElasticSearch). Hopefully we'll finish in time for me to be able to watch some of the stuff at the #nodes2022 conference.

So far, the week also had planning for a common DH initiative and for a Shared Task in DH (#ReferenceExtraction - if you're into this, pls get in touch). There was a discussion about #Ontology for historical administrative entities, and one about which #SQL server to choose for a project, some #XQuery coding and tomorrow a discussion about #NLP methods.