Mastodawn

I'm please to announce that GROBID 0.9.0 is out. Release notes and highlights below.
1/6

1/ New extraction coverage: conflict of interest & author contribution statements; figures, tables and equations from back/annex sections; URLs from PDF annotations; ORCID identifiers fetched via Crossref when absent from the source document.
2/ Native Linux ARM64 support.
2/6

Show thread

Luca Apr 14

Multi-architecture Docker images (amd64 + arm64) are now available, enabling native deployment on Apple Silicon and ARM cloud instances.
3/6

Show thread

Luca Apr 14

3/ Revised Crossref consolidation, developed in collaboration with the Crossref team — improved rate limit handling, better error recovery, and more robust reference matching.
4/6

Show thread

Luca Apr 14

4/ New pluggable NLP engines: Lingua for language identification, Blingfire for sentence segmentation — both available as drop-in alternatives to the existing defaults.
5/6

Show thread

Luca

5/ Infrastructure: JDK 21, Gradle 9, TensorFlow 2.17 (Python 3.10–3.11), pdfalto 0.6.0, wapiti 1.5.1, virtualenv/conda support for DeLFT.
6/ Full release notes → https://github.com/kermitt2/grobid/releases/tag/0.9.0

#GROBID #OpenSource #NLP #ScholarlyInfrastructure
6/6

Release 0.9.0 · grobidOrg/grobid

What's Changed Added Conflict of interest and author contributions statement extraction in header and segmentation models #1319 Extract figures, tables and equations from back/annex sections #1215...

GitHub