Very happy to be in #Montpellier today for a "journée d'étude" on the notion of #tupleization in the context of #cooccurrence, #keyness, #frequency and #dispersion. – The opening speaker is Stefan Th. Gries, and the full programme can be found here: https://corli.huma-num.fr/events/untangling-associations-advances-in-collocation-and-keyword-analysis/

#CorpusLinguistics #CLS

Key take-aways for me so far from Stefan Gries' talk:

We should be using "pure" measures that only measure one thing, so that they are valid. Measures that intransparently conflate two or several aspects are not as useful as they could be.

We need multiple dimensions to fully describe a word's characteristics, but they need to be independent of each other / uncorrelated (see above), for it to make sense to "tupleize" them.

#tupleization #dispersion #frequency

Gries' final proposal: Let's measure #keyness in three components based on frequency, (pure) association and (pure) dispersion. Then, look at components separately. And/or parametrize the weights of these three components, depending on the needs and aims of your study.

Martin Hilpert reports on a reanalysis / reimplementation study regarding the relationship between dispersion and degrees of generality or abstractness of meaning of words.

They compared results from a new dispersion measure (DPnofreq from Gries 2022) to earlier work with a dispersion measure that correlates with frequency (DP, Gries 2008). Results shift considerably!

As I am interested in such replication / reproduction / reanalysis research, I of course love the final sentence from Martin Hilpert: "Revisiting old datasets with news methods, a scary but necessary endeavor." Could not agree more!
Next in a series of highlights, now we will hear a talk by Ludovic Lebart, pioneer in French statistical analysis of textual data, with a talk on: "Dealing with low frequencies or high discrepancies of lexical frequencies: How to adapt the tools of textual data analysis to corpora of poems and lyrics."

Still going strong in the afternoon, now Bénédicte Pincemin (ENS Lyon), on "The Specificity Measure in Textometry: a Hermeneutic Use of the Fisher’s Exact Test".

#corpuslinguistics #statistics #montpellier

In a nutshell, I think Bénédicte's talk could be summarized as a "défense et illustration" of the Fisher-Yates-Exact (FYE) test used in the spécificité measure for keyness, followed by a "défense et illustration" of #TXM for investigating specificity results in detail. 😀

My own talk, just now, was on “Keyness in Computational Literary Studies: History, Definitions and Evaluation”.

You find the presentation here: https://dhtrier.quarto.pub/montpellier

It is based on work with @dudarjulia, @cnDuKeli, Julia Röttgermann and @julianschroeter.

Relevant publications: https://doi.org/10.48694/jcls.10 (open) and https://zenodo.org/record/5707377 (private sharing)

#montpellier #keyness #distinctiveness #corpuslinguistics #cls

slides

@christof ohhhh, so sad to miss this — it looks amazing!

I'd be particularly interested in getting a chance to read your paper. Do you intend to publish it anytime soon?

@alischinsky – Thanks for your interest in our work. The updated title is actually "Keyness in CLS: HIstory, Definitions, Evaluation".

The talk mostly reports on collaborative work published here: https://doi.org/10.48694/jcls.102 and here: https://zenodo.org/record/5707377

@christof that's amazing, thanks so much!