Useful paper investigating the precision of various #reliability and #MeasurementError parameters under different conditions and study designs:
https://link.springer.com/article/10.1007/s10742-022-00293-9

It comes with an #RStats shiny tool to explore some of these oneself:
https://iriseekhout.shinyapps.io/ICCpower/

#Psychometrics

Sample size recommendations for studies on reliability and measurement error: an online application based on simulation studies - Health Services and Outcomes Research Methodology

Simulation studies were performed to investigate for which conditions of sample size of patients (n) and number of repeated measurements (k) (e.g., raters) the optimal (i.e., balance between precise and efficient) estimations of intraclass correlation coefficients (ICCs) and standard error of measurements (SEMs) can be achieved. Subsequently, we developed an online application that shows the implications for decisions about sample sizes in reliability studies. We simulated scores for repeated measurements of patients, based on different conditions of n, k, the correlation between scores on repeated measurements (r), the variance between patients’ test scores (v), and the presence of systematic differences within k. The performance of the reliability parameters (based on one-way and two-way effects models) was determined by the calculation of bias, mean squared error (MSE), and coverage and width of the confidence intervals (CI). We showed that the gain in precision (i.e., largest change in MSE) of the ICC and SEM parameters diminishes at larger values of n or k. Next, we showed that the correlation and the presence of systematic differences have most influence on the MSE values, the coverage and the CI width. This influence differed between the models. As measurements can be expensive and burdensome for patients and professionals, we recommend to use an efficient design, in terms of the sample size and number of repeated measurements to come to precise ICC and SEM estimates. Utilizing the results, a user-friendly online application is developed to decide upon the optimal design, as ‘one size fits all’ doesn’t hold.

SpringerLink

Beyond the Dataset

On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is  in answering the question: How was this place put together? 

A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.

Two load-bearing pillars

While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

  • How we measure — the trip from reality to raw numbers. Feature extraction.
  • How we compare — the rules that let those numbers answer a question. Statistics and causality.
  • Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

    How we measure

    A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

    In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

    How we compare

    Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

    This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.   

    The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished. 

    Why the pillars get skipped

    Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

    The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

    Opening day

    Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.

    #AI #causalInference #cleanData #dataCentricAI #dataProvenance #dataQuality #dataScience #evidenceBasedDecisionMaking #experimentDesign #featureExtraction #foundationEngineering #instrumentation #measurementError #science #startupAnalytics #statisticalAnalysis #statistics

    #statstab #281 Correcting Cohen’s d for Measurement Error (A Method!)

    Thoughts: Scale reliability can be incorporated into effect size computation (i.e., remove attenuation)

    #measurementerror #effectsize #cohend #reliability #scales

    http://rpubs.com/JLLJ/RPBD

    RPubs - Classical Reliability Correction for Cohen's d and the Point-Biserial

    Using Large Language Models for Qualitative Analysis can Introduce Serious #Bias
    https://arxiv.org/abs/2309.17147
    "…using LLMs to annotate text creates a risk of introducing biases that can lead to misleading inferences, in the sense that the errors that LLMs make in annotating are not random with respect to the characteristics of the subjects. Training simpler supervised models on high-quality human annotations with flexible coding leads to less #measurementError and bias than #LLM annotations"
    Using Large Language Models for Qualitative Analysis can Introduce Serious Bias

    Large Language Models (LLMs) are quickly becoming ubiquitous, but the implications for social science research are not yet well understood. This paper asks whether LLMs can help us analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees in Cox's Bazaar, Bangladesh. We find that a great deal of caution is needed in using LLMs to annotate text as there is a risk of introducing biases that can lead to misleading inferences. We here mean bias in the technical sense, that the errors that LLMs make in annotating interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human annotations with flexible coding leads to less measurement error and bias than LLM annotations. Therefore, given that some high quality annotations are necessary in order to asses whether an LLM introduces bias, we argue that it is probably preferable to train a bespoke model on these annotations than it is to use an LLM for annotation.

    arXiv.org

    If you want to know more about...
    'Adjustment Methods for Data Quality Problems: #MissingData, #MeasurementError and #Misclassification',
    join #AlbertVarela and me in our 2-day #NCRM short course, the 10th and 11th of January at Leeds.

    https://www.ncrm.ac.uk/training/show.php?article=13157

    Training course: Bringing Qualitative Analysis to Life: Making the most of your senses

    Join Anuja Cabraal, Daniel Turner and Christina Silver for two days exploring creative, tactile and reflexive ways of working with qualitative materials. This workshop encourages you to go beyond rea

    NCRM

    'Exploring the impact of measurement error in police recorded crime rates through sensitivity analysis' out in Crime Science.

    #Criminology #crimodon #CriminalJustice #police #CrimeData #Bias #MeasurementError @criminology

    https://link.springer.com/article/10.1186/s40163-023-00192-5#Tab1

    Exploring the impact of measurement error in police recorded crime rates through sensitivity analysis - Crime Science

    It is well known that police recorded crime data is susceptible to substantial measurement error. However, despite its limitations, police data is widely used in regression models exploring the causes and effects of crime, which can lead to different types of bias. Here, we introduce a new R package (‘rcme’: Recounting Crime with Measurement Error) that can be used to facilitate sensitivity assessments of the impact of measurement error in analyses using police recorded crime rates across a wide range of settings. To demonstrate the potential of such sensitivity analysis, we explore the robustness of the effect of collective efficacy on criminal damage across Greater London’s neighbourhoods. We show how the crime reduction effect attributed to collective efficacy appears robust, even when most criminal damage incidents are not recorded by the police, and if we accept that under-recording rates are moderately affected by collective efficacy.

    SpringerLink

    David Buil and myself chatting with Iain Brennan about the #RecountingCrime project and different problems with crime measurement
    https://innovativecriminology.podbean.com/e/measuring-crime-in-small-areas/
    in what is the first episode of the
    'Innovative Methods in Criminology' podcast.

    #Criminology, #Crime, #MeasurementError, #PoliceData, @criminology

    Measuring crime in small areas | Innovative Methods in Criminology

    Jose Pina-Sanchez and David Buil-Gil discuss their ESRC project with Ian Brunton-Smith and Alexandru Cernat, Recounting Crime, generating exciting new ways to better measure crime in small areas.

    #introduction hey, everyone. I'm a gay psychologist who focuses on measuring cognitive ability, quantitative psychology, and psychometrics. I work mostly in measuring educational achievement. I have a sociological background in Marxian philosophy and am dedicated to social justice. In my spare time I paint portraits and exercise. #PsychologicalMethods #painting #measurementerror #MeasurementInvariance

    Here goes my #Introduction:

    #Economics and #SocialStats by training, #Quantitative #Criminology by trading.

    Interested in all things #SocialScience and #ResearchMethods

    Active in:
    #Sentencing, where I try to operationalise elusive concepts like #Consistency, #Individualisation, #Proportionality, #Severity or #Discrimination.

    And #Measurement / #MeasurementError, especially interested in #PoliceData and #CrimeData, their flaws, implications and adjustments, www.recountingcrime.com.

    Ok, anyone in Mastodon interested on #sentencing #criminaljustice, #disparities #measurementerror, #missing data, #causalinference, #statisticalmodelling, #openscience? I still do not know how this works 🙃