Recs for text analysis tools, without any or only minimal genai - Taguette, QDA Miner, what else? Bulk document (around 50 papers) common word analysis is what Im mainly looking for, as well as individual document labelling. Open source, free, Windows 10.
#QualitativeData #textanalysis #software #research #academia #academicchatter #opensource

Discover google/langextract, a game-changing Python library for extracting insights from unstructured text with LLMs! #LLM #NLP #TextAnalysis

The google/langextract library leverages Large Language Models (LLMs) to extract structured information from unstructured text, enabling precise source grounding and interactive visualization capabilities. By utilizing Python, developers can integrate this...

#google/langextract #LargeLanguageModels #NaturalLanguageProcessing #PythonLibraries

#5WAnalysis #5W #textanalysis #phânTíchVanBản #5YếuTố
Cần xác định 5Yếu tố: Ai - Gì - Ở Đâu - Khi nào - Vì sao khi phân tích văn bản? Tìm tool giúp phân tích logic, không suy diễn & tự động tìm nguồn kiểm chứng online. Bạn thường dùng phương pháp gì?

(None: Bài đăng gốc thiếu thông tin cụ thể về nền tảng hay ví dụ thực tế)

https://www.reddit.com/r/LocalLLaMA/comments/1p80l7x/how_to_analyse_text_for_5w/

New article in #JCLS 4(1)! 🎉
@dudarjulia & @christof introduce a method for evaluating measures of #distinctiveness ( #keyness ) using synthetically generated, fully controlled text data.
#CLS #TextAnalysis #Evaluation #NLP #NLG #LiteraryComputing #CCLS25
https://jcls.io/issue/118/info/
Journal of Computational Literary Studies | Issue: Issue: 1(4) (2025)

📚 Análisis cualitativo para grandes volúmenes de texto con LLMs
Usá #IA para potenciar tu análisis cualitativo de textos en #RStats mediante la API de Gemini.
Con Ismael Aguayo y Exequiel Trujillo

📅 2 Dic, 14:00–16:15 UTC-3–Online
💵 Estudiantes USD 5 · Académicos USD 10 · Industria USD 15
🔗 https://www.eventbrite.com.ar/e/1962623150676

#LatinR2025 #TextAnalysis

Charting Twain: Building a Character Interaction Graph with Quarkus, OpenNLP, and a local Ollama Model. Uncover hidden dynamics in Huckleberry Finn using Java, sentiment analysis, and modern NLP.
https://myfear.substack.com/p/text-analytics-quarkus-opennlp-huckleberry-finn
#Java #Quarkus #OpenLNP #TextAnalysis
Ah, the groundbreaking revelation that #LLMs don't handle more words as well as they handle fewer. 🤯 Who knew that feeding a massive text blob would confuse a glorified autocomplete? 😂 Next week: water is wet! 🌊
https://research.trychroma.com/context-rot #textanalysis #AIhumor #technews #revelations #HackerNews #ngated
Context Rot: How Increasing Input Tokens Impacts LLM Performance

Wow! #QualiService could be a great resource!

It wasn't obvious to me how to find the transcripts for these doctor-patient interaction data from 4 countries, but if such transcripts are accessible, that's GREAT!

https://www.qualiservice.org/en/qsearch.html?q=diagnosis

#medicine #openData #cogSci #TextAnalysis

Qualitative social science datasets search - Qualiservice Data Sharing

Qualitative social science datasets search

🇪🇺 Want to analyze text from the EU public consultations? EU public consultations are a way in which the EU invites the broader public to publicly comment on upcoming legislation.

📦  I just published a first version of a Python package {eu-consultations} to scrape and extract text from the EU website:
https://github.com/marioangst/eu_consultations

- download consultation data as displayed on the EU's frontend into a validated form
- download associated files (this is the hard part about analysing this data - lots of feedback is in .docx and .pdf files)
- extract text from the files using docling and attach to feedback

You get all data in validated form and possibly stored in huge (sorry for that) JSON files ;).

This package is part of an analysis project on feedback the EU has received via the public consultation process on digital policy we plan to present later this year, but I thought let's make some of the tools we use open source way earlier already.

#python #textanalysis #policyanalysis #CompSocSci

GitHub - marioangst/eu_consultations: eu-consultations: A Python package for scraping textual data from EU public consultations

eu-consultations: A Python package for scraping textual data from EU public consultations - marioangst/eu_consultations

GitHub

Useful contribution to discussions in this area, for sure! The results highlight "whether an automated approach that would still require micromanaging and adjusting several variables by the human researcher would, in fact, be more efficient an approach compared to the same tasks performed manually by human labour"

Out of Context! Managing the Limitations of Context Windows in #ChatGPT-4o Text Analyses https://doi.org/10.46298/jdmdh.15090 #DigitalHumanities #TextAnalysis #LLM #ArtificialIntelligence #GLAMR

Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text Analyses

In recent years, large language model (LLM) applications have surged in popularity, and academia has followed suit. Researchers frequently seek to automate text annotation - often a tedious task – and, to some extent, text analysis. Notably, popular LLMs such as ChatGPT have been studied as both research assistants and analysis tools, revealing several concerns regarding transparency and the nature of AI-generated content. This study assesses ChatGPT’s usability and reliability for text analysis – specifically keyword extraction and topic classification – within an “out-of-the-box” zero-shot or few-shot context, emphasizing how the size of the context window and varied text types influence the resulting analyses. Our findings indicate that text type and the order in which texts are presented both significantly affect ChatGPT’s analysis. At the same time, context-building tends to be less problematic when analyzing similar texts. However, lengthy texts and documents pose serious challenges: once the context window is exceeded, “hallucinated” results often emerge. While some of these issues stem from the core functioning of LLMs, some can be mitigated through transparent research planning.

Episciences