I just received my copy of #aicon by @emilymbender and @alex. I'm really looking forward to going through this book.
Emily M. Bender & Alex Hanna, (2025). The AI con: How to fight big tech’s hype and create the future we want
| web | https://f-krueger.github.io/ |
| orcid | https://orcid.org/0000-0002-7925-3363 |
I just received my copy of #aicon by @emilymbender and @alex. I'm really looking forward to going through this book.
Emily M. Bender & Alex Hanna, (2025). The AI con: How to fight big tech’s hype and create the future we want
We just published the first dataset created within our @Textplus cooperation project. For this dataset, we manually transcribed the tables of arriving spa and bathing guests from the Swinemünder Badeanzeiger, published between 1910 and 1932. The data is available under CC0 from https://zenodo.org/records/14603757
This dataset contains the ground truth annotation for extracting and structuring information from the old newspaper "Swinemünder Badeanzeiger" tables. The newspaper was obtained from Digitale Bibliothek Mecklenburg Vorpommern https://www.digitale-bibliothek-mv.de/viewer/toc/PPN636776093/ The data was obtained by selecting one "Swinemünder Badeanzeiger" image per year and manually transcribing the content. The dataset is structured based on the newspaper's publication year. One folder for each year contains a folder named according to the original image ID and includes the following data table_[running_number].jpg image with the segmented table table_[running_number]_annotation.json data extracted and structured from the segmented image by manual transcription table_[running_number]_index_connected.json list that connected the entry with the corresponding table rows to maintain multi-row entries For each entry, a JSON entry was created and added to table_[running_number]_annotation.json, which consists of the following fields: input: Transcription of the original row, including markers for columns Nummer: The sequence number of the row as extracted from the input field Vorname: The first name, if it exists otherwise null Nachname: The last name, if it exists; otherwise null Titel: The (academic) title, if it exists, otherwise null Beruf: The profession, if it exists; otherwise null Sozialer Stand: The social status, if it exists, otherwise null Begleitung: Any companion, such as family members or servants, if exists, otherwise null Wohnort: The city, where the person(s) arrived from, if it exists, otherwise null Wohnung: The local residence, such as a hotel, pension, or vacation home, if it exists, otherwise null Personenanzahl: The overall number of persons that are represented by this entry In addition to the separate annotation files, the file swinebad_groundtruth.json has a complete list of all entries to facilitate more straightforward data analysis. To this end, each entry was completed with the following data. date: The publication date of the newspaper where the entry was published The following example lists an entry which was obtained from the fourth line of the table as published at https://www.digitale-bibliothek-mv.de/viewer/image/PPN636776093_1910/1/LOG_0003/ { "input": "973 | Dr. Auerbach, Richard, Journalist, mit Frau | „ | Villa Kaiser Wilhelm | 2", "Nummer": "973", "Vorname": "Richard", "Nachname": "Auerbach", "Titel": "Dr.", "Beruf": "Journalist", "Sozialer Stand": null, "Begleitung": "mit Frau", "Wohnort": "Berlin", "Wohnung": "Villa Kaiser Wilhelm", "Personenanzahl": "2", "date": "1910-06-06" }, You are welcome to cite the 'Digitale Bibliothek MV / Universität Greifswald' (+ URN for digital publications or the shelfmark for printed publications) as the source for the images.
Today, with #ORDSMV, we organized a full-day workshop on geospatial and marine-related data at Leibniz Institute for Baltic Sea Research Warnemünde.
More information at:
https://ords-mv.github.io/blog/ords-meets-the-sea/
w/ @AnjaEggert, M. Schröder, C. Hassenrück and M. Reichelt
📢 🤩 Our recent article has been published by @QSS_ISSI
w/ D.Schindler, T.Hossain and S.Spors
We investigate the quality of software citation meta data in terms of accuracy and completeness. We found that current citation practices by authors, publishers and databases are unsuited to satisfy the needs in terms of identification of the employed software and providing credit for the developers of the software.
Abstract. Software is a central part of modern science, and knowledge of its use is crucial for the scientific community with respect to reproducibility and attribution of its developers. Several studies have investigated in-text mentions of software and its quality, while the quality of formal software citations has only been analyzed superficially. This study performs an in-depth evaluation of formal software citation based on a set of manually annotated software references. It examines which resources are cited for software usage, to what extend they allow proper identification of software and its specific version, how this information is made available by scientific publishers, and how well it is represented in large-scale bibliographic databases. The results show that software articles are the most cited resource for software, while direct software citations are better suited for identification of software versions. Moreover, we found current practices by both, publishers and bibliographic databases, to be unsuited to represent these direct software citations, hindering large-scale analyses such as assessing software impact. We argue that current practices for representing software citations—the recommended way to cite software by current citation standards—stand in the way of their adaption by the scientific community, and urge providers of bibliographic data to explicitly model scientific software.Peer Review. https://www.webofscience.com/api/gateway/wos/peer-review/10.1162/qss_a_00309
📢 Happy to share that the first paper of @chriwer has been accepted at this year's @semdh workshop at @eswc_conf
Together with Zacharias Shoukry and Soham Al-Suadi, we created a corpus of Biblical Names to study additions, omissions and variations across different manuscripts by integrating data from the NTVMR, @dbpedia and @FactGrid
https://sigmoid.social/@semdh/112247487542046637
All data (https://zenodo.org/records/10816647) and code (https://github.com/chr-werner/SemDH2024-GreekNewTestamentNames) is publicly available.
🥳 The list of papers accepted to be published at #SemDH2024 is online! Congrats to all authors & a big thank you to the PC members who all worked very hard on their reviews 💪 See you at #ESWC2024 https://semdh.github.io/accepted-papers.html @eswc_conf @[email protected] @[email protected] @lysander07
📢 Dear #FediHum community, I'm searching for support on the extraction of tabular data from old magazines. Together with the university library in #Rostock we will work automatic identification of tables and extract historic.
If you are interested, please apply. For questions contact me directly.
#DigitalHumanities #OpenScience #Wismar #FDM #MV @Textplus #NFDI @publicDH
(Screenshot from https://www.digitale-bibliothek-mv.de/viewer/image/PPN636776093_1920/5/)
Are you interested in #Software mentions in scholarly publications? Our SOMD - SOftware Mention Detection shared task is now online 🥳 🤩
The shared task is part of the 1st workshop on Natural Scientific Language Processing and Knowledge Graphs (NSLP) co-located with @eswc_conf 2024
Further information: https://nfdi4ds.github.io/nslp2024/