Fresh #CLS research ahead!!

The issue with the papers from the #2023 #conference in #Würzburg is almost complete. Find the papers published so far here: https://jcls.io/issue/105/info/

There’s already a lot of exciting research there, and more is to come!

#Computational #Literary #Studies

Journal of Computational Literary Studies | Issue: Issue: 1(2) (2023)

All of the papers from the 2023 edition of the #CCLS, the Conference on Computational Literary Studies, are now available!

It's an issue packed, of course, with top-notch #CLS research so you should definitely have a look! We're highlighting one paper per day over the next two weeks, so stay tuned and follow this account!

URL: https://jcls.io/issue/105/info/

@peertrilcke @EvelynGius @fotis_jannidis @SvenjaGuhr @u_henny

Journal of Computational Literary Studies | Issue: Issue: 1(2) (2023)

First up: Frederik Arnold and Robert Jäschke (2023). “A Novel Approach for Identification and Linking of Short Quotations in Scholarly Texts and Literary Works”. In: Journal of Computational Literary Studies 2 (1). doi: https://doi.org/10.48694/jcls.3590

Keywords: quotation linking, literary works, scholarly works, machine learning, language models

The figure shows a visualization of quotation identification and linking in three steps.

A Novel Approach for Identification and Linking of Short Quotations in Scholarly Texts and Literary Works

We present two approaches for the identification and linking of short quotations between scholarly works and literary works: ProQuo, a specialized pipeline, and ProQuoLM, a more general language model based approach. Our evaluation shows that both approaches outperform a strong baseline and the overall performance is on the same level. We compare the performance of ProQuoLM on texts with and without (page) reference information and find that reference information is not used. Based on our findings, we propose the following steps for future improvements: further analysis of the influence of a bigger context window for better handling of long distance references and the introduction of positional information of the literary work so that reference information can be utilized by ProQuoLM.

Journal of Computational Literary Studies

Next: Harri Kiiskinen, Asko Nivala, Jasmine Westerlund, and Juhana Saarelainen (2023). “Extracting Geographical References from Finnish Literature. Fully Automated Processing of Plain-Text Corpora”. In: Journal of Computational Literary Studies 2 (1). doi: https://doi.org/10.48694/jcls.3584.

Keywords: named entity recognition, geographic information system, #geoparsing linked open data, literary geography, #Finland

#JCLS #CLS #LOD #NER

Extracting Geographical References from Finnish Literature. Fully Automated Processing of Plain-Text Corpora

In the Atlas of Finnish Literature 1870-1940 project, we extract geographical information from a Finnish-language corpus of literary texts published between 1870 and 1940. The texts are transformed from plain texts to TEI/XML, and further processed with named entity recognition and linking tools. The results are presented in a web-based environment. This article describes the technical structure of the analysis chain, the tools used and the metaprocesses used to manage the research dataset.

Journal of Computational Literary Studies

Also on #novels, but this time focused on #Hungary:

Botond Szemes (2023), "Stylistic History of the Hungarian Novel Based on Sentence Structures", Journal of Computational Literary Studies 2 (1), 1–25. doi: https://doi.org/10.48694/jcls.3582.

Keywords: literary history, classification, epistemology, #stilometry, #sentence structure

Stylistic History of the Hungarian Novel Based on Sentence Structures

The paper presents a method for the automatic identification of different types of compound and complex sentences in Hungarian through the analysis of conjunctions and their positions. This method opens up new perspectives in stylometry: on the one hand, conjunctions as function words provide a large amount of data for statistical analyses, and on the other hand, they also carry meaning - about the relations between clauses (e.g. opposition, conditionality). By examining the relative frequency of each type, it is possible to reveal the most typical relations between clauses in a given text or corpus. In this way, the style of novels can be described at the level of the sentence, while also revealing the topological-logical structure and epistemological attitude of the texts, which is not usually reflected in the reading process. This method also provides an opportunity to identify different stylistic traditions in literary history.

Journal of Computational Literary Studies

Not on novels, but on Holocaust survivor testimonials:

Eitan Wagner, Renana Keydar, Amir Pinchevski & Omri Abend (2023). "Automatic Topic-Guided Segmentation of Holocaust Survivor Testimonies", Journal of Computational Literary Studies 2 (1), 1–26. doi: https://doi.org/10.48694/jcls.3580.

Keywords: #segmentation, #spoken narratives, #testimonies, #narrative analysis, #topic analysis, mutual information, #NLP #CLS #JCLS

Automatic Topic-Guided Segmentation of Holocaust Survivor Testimonies

In recent decades, efforts have been made to gather and digitize the testimonies of living Holocaust survivors. The challenge we now face is attending to those thousands of human stories, which while safely stored in archives, may nevertheless disappear into oblivion. Despite recent advances in narrative analysis in the fields of Computational Literature (CL) and Natural Language Processing (NLP), existing language model technology still faces challenges in analyzing elaborate narratives and long texts. One such challenge is text segmentation – a long-standing issue in the area of CL and NLP.  In our work, we propose a computational method to approach this problem. Our research draws on testimony transcripts from the Shoah Foundation (SF) Holocaust  archive for supervised topic classification, which is then used as topics guidance for automatic segmentation.

Journal of Computational Literary Studies

This one is about #novels again, but in #Portuguese:

Cláudia Freitas & Diana Santos (2023), "#Gender Depiction in Portuguese", Journal of Computational Literary Studies 2 (1). doi: https://doi.org/10.48694/jcls.3576.

Keywords: distant reading, annotation, #Portuguese literature, #Brazilian #literature

#CLS #JCLS #GenderStudies

Gender Depiction in Portuguese

In this paper, we look at how masculine and feminine characters are described in literature in Portuguese, using a publicly available literary corpus: Literateca. We investigate the words used to characterise human beings, after classifying them in four broad categories, namely those related to the social, appearance, character and emotional axes. We study the influence of genre, literary school, author gender, and time, among others.

Journal of Computational Literary Studies

Another paper, this time on #translation of #Japanese #poetry:

Xudong Chen, Hilofumi Yamamoto & Bor Hodošček (2023), "Translation-based connotation #visualization for classical poetic Japanese vocabulary of the Kokin Wakashū ca. 905", Journal of Computational Literary Studies 2 (1), 1–32. doi: https://doi.org/10.48694/jcls.3596.

Keywords: classical poetic Japanese text, parallel corpora, #connotation, #operationalization, parallel corpora connotation, Kokin Wakashū

Translation-based connotation visualization for classical poetic Japanese vocabulary of the Kokin Wakashū ca. 905

To offer a visualization of connotation in the classical poetic Japanese vocabulary of the Kokin Wakashū as an independent supplement for poetic language dictionaries, this paper presents an operationalization to tackle connotations using non-literal elements which are unveiled during the cross-cultural communication process, i.e., the translation process. Grounded on Schramm's communication model, we suggest calculating the set difference between the Kokin Wakashū and its ten contemporary Japanese translations to visualize the lexical explanatory additions (non-literal elements) in the translations. Methodologically, we apply the set difference in two distinct ways and implement the visualization on the six most frequent poetic flora words in the Kokin Wakashū, resulting in various depictions of non-literal elements. The set difference-based approaches to non-literal element visualization showed associative images and rhetorical techniques related to the flora words, which are two crucial aspects of connotation. While the other aspects of connotation, such as encyclopedic knowledge, sociolinguistic style, and emotion, are not covered by the proposed visualizations.

Journal of Computational Literary Studies

This next paper is about #stylometry in a #translation setting involving novels in #Swedish and #Danish:

Martje Wijers (2023), “Why the Daisy sisters are different. A stylometric study on the oeuvre of Swedish author Henning #Mankell and the Dutch translations of his work”, Journal of Computational Literary Studies 2 (1), 1–27. doi: https://doi.org/10.48694/jcls.3585

Keywords: #stylometry, #cluster analysis, #PCA, #delta, #zeta, #translation

Why the Daisy sisters are different. A stylometric study on the oeuvre of Swedish author Henning Mankell and the Dutch translations of his work

In this paper, 32 books by the Swedish writer Henning Mankell were investigated using stylometric methods, to find out whether his style varies in different genres, if his style changed measurably over time, or if his books differ from each other stylistically for other reasons. The results show that the time of publication can play a role, but that other factors, such as dominant verb tense used and narrative perspective, as well as register, are more important in determining whether and how the style of novels differs. This study also gives more insight into frequently used methods in stylometry, such as cluster analysis and PCA, that give little information about the stylistic features that differ between texts. For this purpose, the original Swedish texts were also compared to the Dutch translations of the same texts to determine how translation and language influence the results of stylometric analyses.

Journal of Computational Literary Studies

We continue with #novels, but the issue is now #privacy as a #topic:

Erik Ketzan, Jennifer Edmond and Carl Vogel (2023), “Need a Good Book about Privacy? Evaluating Dictionary-Based Corpus Query for Detecting the Topic of Privacy in Literary Texts”, Journal of Computational Literary Studies 2(1), 1–19. doi: https://doi.org/10.48694/jcls.3602.

Keywords: computational literary studies, privacy, corpus query, evaluation, long nineteenth century

Need a Good Book about Privacy? Evaluating Dictionary-Based Corpus Query for Detecting the Topic of Privacy in Literary Texts

This paper evaluates the usefulness of querying Vasalou et al.’s Privacy Dictionary (2011), a dictionary of 616 words and phrases, in 131 canonical English-language novels from the long 19th century. We evaluate the word frequencies compared with a classification of the novels based on scholarly attention to the topic of privacy in each particular text. We report that certain categories of Vasalou’s Privacy Dictionary appear promising for this task, but other Privacy Dictionary categories appear to be poor models. As a final step, we investigate the most promising sub-dictionaries of this Dictionary for our task using point biserial correlation, reporting evidence that three of its sub-dictionaries significantly correlate with scholarly attention to the topic of privacy.

Journal of Computational Literary Studies

Another new article in #JCLS, this one about #reconstructing derived #text formats using #LLMs:

Kai Kugler, Simon Münker, Johannes Höhmann, & Achim Rettinger (2023), “InvBERT: Reconstructing Text from Contextualized Word Embeddings by inverting the BERT pipeline”, Journal of Computational Literary Studies 2(1), 1–18. doi: https://doi.org/10.48694/jcls.3572

Keywords: contextualized word embeddings, derived text formats, text reconstruction, transformer encoder, publication restrictions

@achim

InvBERT: Reconstructing Text from Contextualized Word Embeddings by inverting the BERT pipeline

Digital Humanities and Computational Literary Studies apply automated methods to enable studies on large corpora which are not feasible by manual inspection alone. However, due to copyright restrictions, the availability of relevant digitized literary works is limited. Derived Text Formats (DTFs) have been proposed as a solution. Here, textual materials are transformed in such a way that copyright-critical features are removed, but that the use of certain analytical methods remains possible. Word embeddings produced by transformer-encoders are promising candidates for DTFs because they allow for state-of-the-art performance on analytical tasks. However, in this paper we demonstrate that under certain conditions the reconstruction of the original text from token representations becomes feasible. Our attempts to invert BERT suggest, that publishing the encoder together with the contextualized embeddings is critical, since it allows to generate data to train a decoder with a reconstruction accuracy sufficient to violate copyright laws.

Journal of Computational Literary Studies

The next article we want to highlight is on #novels again, this time about #sound in #Gothic fiction!

Svenja Guhr & Mark Algee-Hewitt (2023), “What's that Scary Sound? Ambient Sound in Gothic Fiction”, Journal of Computational Literary Studies 2(1), 1–28. doi: https://doi.org/10.48694/jcls.3583.

Keywords: sound studies, ambient sound, 19th century, literary prose, English, Gothic fiction

@SvenjaGuhr

What's that Scary Sound? Ambient Sound in Gothic Fiction

This paper presents an approach to operationalizing ambient sound as a literary phenomenon. To illustrate the importance of the ambient soundscape in literary studies, we both manually and automatically detect ambient sound markers and use these annotations to analyze a sample of nineteenth-century English novels and short stories. Our hypothesis is that descriptions of a story’s ambient soundscape can be associated with specific genres, and is, for example, a hallmark of Gothic novels. We use a classification approach based on a state-of-the-art transfer learning algorithm and a domain-dependent fine-tuned BERT model for English to automatically detect word-level sound indicators and compare their occurrence over the course of the fiction and with a comparative view on our corpus texts.

Journal of Computational Literary Studies
@jcls Thanks for highlighting our article!
It was exciting to dive into the scary ambient soundscapes of Gothic fiction, and as a side discovery, we found that short stories are particularly rich in sound representations, regardless of genre.
More on fictional soundscapes will be presented and published later this year 🎶 #LiterarySoundStudies

@jcls

Back to #poetry, this time in #German and regarding #emotions and #literaryhistory:

How to Cite: Leonard Konle, Merten Kröncke, Simone Winko, & Fotis Jannidis (2023), “Connecting the Dots. Variables of Literary History and Emotions in German-language Poetry”, Journal of Computational Literary Studies 2(1), 1–22. doi: https://doi.org/10.48694/jcls.3604.

Keywords: Bayesian hierarchical generalized linear model, literary history, German-language poetry, emotion

@fotis_jannidis @lkonle

Connecting the Dots. Variables of Literary History and Emotions in German-language Poetry

In this study, we will take the first steps toward a quantitative literary history, attempting to identify factors relevant to the history of literature and to model assumptions about the relations between them. We use a case study of German-language poetry in the transition from realism to early modernism to approach our methodological goal. Using a Bayesian hierarchical generalized linear model we focus on one aspect relevant to the history of poetry, the emotions represented in the poems, and also include period, author profession, author gender, rhyme, and verse length in the model. We can confirm the important role of thematic genres and find unexpectedly high values for rhyme and author profession. We also discuss some of the methodological problems of our attempt to model this entangled network of variables involves.

Journal of Computational Literary Studies

@jcls Another paper we would like to highlight, again for the lovers of #novels

Dorothy Henriette Modrall Sperling, Mike Kestemont & Vincent Neyt (2023), “The Authorship of Stephen King’s Books Written Under the Pseudonym “Richard #Bachman”: A Stylometric Analysis”, Journal of Computational Literary Studies 2(1), 1–35. doi: https://doi.org/10.48694/jcls.3594

Keywords: #Stephen_King, #stylometry, #pop_culture, #authorship verification, contemporary English-language #fiction

@jcls Last but certainly not least, a paper on character agency in narrative fiction, by @andrewpiper :

Andrew Piper (2023), “What do characters do? The embodied agency of fictional characters”, Journal of Computational Literary Studies 2 (1): 1–12. doi: https://doi.org/10.48694/jcls.3589.

Keywords: literary #characters, #fiction, #narratology, #nlp, theory of #mind, embodied #cognition