Andrea Nini

@andreanini
63 Followers
98 Following
62 Posts
Senior Lecturer in Linguistics and English Language at the University of Manchester
Websitehttps://andreanini.com/
Githubhttps://github.com/andreanini
Google Scholarhttps://scholar.google.com/citations?user=NP1NQxEAAAAJ&hl=en
Blueskyhttps://bsky.app/profile/andreanini.com

New event! Our first German introductory course to forensic linguistics!

Werde Teil des 1. Einführungskurses Grundlagen der Forensischen #Linguistik am 28. + 29. September 2026 in Düsseldorf!

Bewirb dich bis zum 21. Mai 2026!

https://div-ling.org/gfl

My #RStats package for forensic authorship analysis, 'idiolect', now has an associated paper published in the Journal of Open Source Software: https://doi.org/10.21105/joss.07575. New release of idiolect, 1.2.0, also available from #CRAN.
#forensic #linguistics

http://andreanini.com/2026/03/15/idiolect-an-r-package-for-forensic-authorship-analysis/

idiolect: An R package for forensic authorship analysis

Nini, A., (2026). idiolect: An R package for forensic authorship analysis. Journal of Open Source Software, 11(119), 7575, https://doi.org/10.21105/joss.07575

Journal of Open Source Software

My latest paper with Hugo Bowles and Claire Wood examines a Dickens mystery: did he author the recently decoded story “The Two Brothers”? The answer is complicated. The paper showcases our new method, LambdaG (forthcoming!).

http://andreanini.com/2026/01/26/investigating-a-dickens-mystery/

Investigating a Dickens mystery

My latest paper with Hugo Bowles and Claire Wood examines a Dickens mystery: did he author the recently decoded story “The Two Brothers”? The answer is complicated. The paper showcases our new meth…

Dr Andrea Nini
New article in #JCLS 4(1)! 🎉
@dudarjulia & @christof introduce a method for evaluating measures of #distinctiveness ( #keyness ) using synthetically generated, fully controlled text data.
#CLS #TextAnalysis #Evaluation #NLP #NLG #LiteraryComputing #CCLS25
https://jcls.io/issue/118/info/
Journal of Computational Literary Studies | Issue: Issue: 1(4) (2025)

On Monday, Dr James Tompkinson (University of York) and I presented this paper at the IAFPA 2025 conference, where we showed preliminary results of applying authorship analysis techniques to transcribed speech. You can find the slides at the link below.

http://andreanini.com/2025/07/22/assessing-the-suitability-of-forensic-authorship-analysis-methodologies-for-speech-data/

Assessing the suitability of forensic authorship analysis methodologies for speech data

On Monday Dr James Tompkinson (University of York) and I presented our talk on “Assessing the suitability of forensic authorship analysis methodologies for speech data” at the Internati…

Dr Andrea Nini

Abstract and slides for my #DH2025 talk "Examining an author’s individual grammar" can now be found on my website below. This also includes a tutorial to use LambdaG to study authors' idiosyncratic grammar patterns.

http://andreanini.com/2025/07/16/examining-an-authors-individual-grammar/

Examining an author’s individual grammar

On Monday I delivered a talk at the Comparative Literature Goes Digital Workshop at the Digital Humanities 2025 conference. As part of this talk I have also prepared a tutorial to use our new autho…

Dr Andrea Nini
@stefan_hessbrueggen @christof Thanks! This is actually the first version of the paper, though. To see the most up to date version it's better if you use the general arXiv link (just remove v1): https://arxiv.org/abs/2403.08462
Authorship Verification based on the Likelihood Ratio of Grammar Models

Authorship Verification (AV) is the process of analyzing a set of documents to determine whether they were written by a specific author. This problem often arises in forensic scenarios, e.g., in cases where the documents in question constitute evidence for a crime. Existing state-of-the-art AV methods use computational solutions that are not supported by a plausible scientific explanation for their functioning and that are often difficult for analysts to interpret. To address this, we propose a method relying on calculating a quantity we call $λ_G$ (LambdaG): the ratio between the likelihood of a document given a model of the Grammar for the candidate author and the likelihood of the same document given a model of the Grammar for a reference population. These Grammar Models are estimated using $n$-gram language models that are trained solely on grammatical features. Despite not needing large amounts of data for training, LambdaG still outperforms other established AV methods with higher computational complexity, including a fine-tuned Siamese Transformer network. Our empirical evaluation based on four baseline methods applied to twelve datasets shows that LambdaG leads to better results in terms of both accuracy and AUC in eleven cases and in all twelve cases if considering only topic-agnostic methods. The algorithm is also highly robust to important variations in the genre of the reference population in many cross-genre comparisons. In addition to these properties, we demonstrate how LambdaG is easier to interpret than the current state-of-the-art. We argue that the advantage of LambdaG over other methods is due to fact that it is compatible with Cognitive Linguistic theories of language processing.

arXiv.org

I recently had the pleasure of being a guest on the ‘Writings Wrongs’ podcast, where we discussed the Aiya Napa rape case and my trial evidence. I highly recommend this episode and the whole podcast: https://www.aston.ac.uk/research/forensic-linguistics/writing-wrongs

http://andreanini.com/2025/07/08/appearance-on-the-writing-wrongs-podcast/

Writing Wrongs

The slides and abstract of our talk "A corpus analysis of idiolectal n-grams" at #CL2025 are now available here: https://doi.org/10.5281/zenodo.15806985

http://andreanini.com/2025/07/04/a-corpus-analysis-of-idiolectal-n-grams/

A corpus analysis of idiolectal n-grams

This study explores linguistic individuality - each individual’s unique repertoire of units (sequences of words, morphemes or parts of speech) that they use recurrently - through a corpus-based analysis. Whilst previous research tends to focus on collective linguistic features, this study targets fine-grained, individual-specific patterns that can be identified through computational authorship verification techniques. Some of these sequences are highly specific to an individual, such as Tony Blair’s use of "entirely understand" (Mollin 2009). However, there is also intuitively overlap across the repertoires of units that different individuals possess, for instance very common lexical bundles such as "I said to him" (Biber et al. 2021). As such, it is more often the combination of a large number of core grammatical constructions, as opposed to a small number of noticeably idiosyncratic phrases, that results in greater variation between authors than within one individual’s language (Barlow 2013). The present study found that across 18 different authors, each writing two summaries of the exact same text 30 days apart, only one character 7-gram featured across all of the texts. All authors used at least one long character n-gram (7-9 characters) in both texts that was entirely unique to them. The study also explores whether the component units within the n-grams differ between what is entirely individual, yet consistent, and what is used consistently, but is shared by other members of the group. The implications of this research centre on enhancing our understanding of why authorship analysis methods work, producing empirical evidence of cognitive linguistic theories of individuality, which a limited number of existing studies have aimed to investigate, and exemplifies the benefits and possibilities of applying corpus linguistic methodologies to authorship analysis problems.List of referencesBarlow, Michael. 2013. Individual Differences and Usage-based Grammar. International Journal of Corpus Linguistics 1, 443-478.Biber, Douglas, Stig Johansson, Geoffrey N Leech, Susan Conrad & Edward Finegan. 2021. Lexical Expressions in Speech and Writing. In Grammar of Spoken and Written English, 979 – 1030. Amsterdam: John Benjamin’s Publishing Company.Mollin, Sandra. 2009. “I entirely understand” is a Blairism: The Methodology of Identifying Idiolectal Collocations. International Journal of Corpus Linguistics 14, 367-392.

Zenodo