Spire: Structure-Preserving Interpretable Retrieval of Evidence

SPIRE는 HTML과 같은 반구조화 문서에서 증거를 해석 가능하게 검색하는 구조 보존 기반 파이프라인을 제안한다. 기존 임베딩 및 생성 모델의 평면적 시퀀스 인터페이스와 문서 구조 간 불일치를 해결하기 위해, 문서 내 하위 문서 단위로 후보를 표현하고 전역 및 지역 문맥화 기법을 도입해 해석 가능한 인용문을 생성한다. 실험 결과, 구조 보존과 문맥화가 결합된 접근법이 고정 예산 내에서 더 높은 품질과 다양성을 가진 인용문을 제공하며 확장성도 유지함을 보였다.

https://arxiv.org/abs/2604.20849

#informationretrieval #structureddocuments #embedding #contextualization #html

SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

Retrieval-augmented generation over semi-structured sources such as HTML is constrained by a mismatch between document structure and the flat, sequence-based interfaces of today's embedding and generative models. Retrieval pipelines often linearize documents into fixed-size chunks before indexing, which obscures section structure, lists, and tables, and makes it difficult to return small, citation-ready evidence without losing the surrounding context that makes it interpretable. We present a structure-aware retrieval pipeline that operates over tree-structured documents. The core idea is to represent candidates as subdocuments: precise, addressable selections that preserve structural identity while deferring the choice of surrounding context. We define a small set of document primitives--paths and path sets, subdocument extraction by pruning, and two contextualization mechanisms. Global contextualization adds the non-local scaffolding needed to make a selection intelligible (e.g., titles, headers, list and table structure). Local contextualization expands a seed selection within its structural neighborhood to obtain a compact, context-rich view under a target budget. Building on these primitives, we describe an embedding-based candidate generator that indexes sentence-seeded subdocuments and a query-time, document-aware aggregation step that amortizes shared structural context. We then introduce a contextual filtering stage that re-scores retrieved candidates using locally contextualized views. Across experiments on HTML question-answering benchmarks, we find that preserving structure while contextualizing selections yields higher-quality, more diverse citations under fixed budgets than strong passage-based baselines, while maintaining scalability.

arXiv.org

@ElenLeFoll Yes, the music was awesome!

It was also nice to hear from Eva Martha Eckkrammer that #DigitalHumanities has a well-etablished and innovative role within the #Romanistik, a strength to maintain into the future.

I agree that the voice of Romance Studies, with their experience of #multilingualism, #diversity, #contextualization and #comparison, is vital for the future development also of #DigitalHumanities methods, including but not limited to #LLMs.

Canaanite Children: Did They Deserve It? - Alex O Connor

#contextualization #biblicalviolence #moraljustification

🍎 #Contextualization, #Training, and #Community are the three pillars that shape our work.
Thank you to everyone who has supported us during these first 5 years!
🌎 Join our mission and help us strengthen Latin America's presence on the global research map 🔬: www.metadocencia.org
✨ Every action counts! ✨

The #RomanticPeriodPoetryArchive sits at the crossroads of #ComparativeLiterature and the #DigitalHumanities.

It facilitates the collaborative #contextualization of #poems on any expression level (full-text, facsimile, recording, ...)

https://romanticperiodpoetry.org

#Romanticism #DH

Home · Romantic Period Poetry Archive (RPPA)

Romantic Period Poetry Archive (RPPA)

The Strange Past (Het Vreemde
Verleden) - a free Dutch-language handbook helping teachers develop students' historical #contextualization skills. It includes theory, examples & classroom materials. Ed. Tim Huijgen

🔗 https://www.rug.nl/research/openscience/open-research-award/case-studies-list-2024/huijgen

It's one of the 17 eligible case studies that were submitted for our #OpenResearch Award 2024.

🔗 https://www.rug.nl/research/openscience/open-research-award/case-studies-2024

#OpenScience #OpenEducation #OpenEducationalResources #didactics #histodons

Empowering Social Studies Education: Open Access Handbook for Teaching Historical Contextualization.

Tim Huijgen (Faculty of Behavioural and Social Sciences/ Teacher Education)

University of Groningen

🧭 10 Simple Rules for Leadership Without Formal Authority
🍎 From the MetaDocencia #Contextualization team, we contributed to the Spanish version of this resource for #OpenSource Leaders: https://eoss-om-communitycalls.github.io/2024-08-27-10-simple-rules-for-leadership/es
👉 English version: https://eoss-om-communitycalls.github.io/2024-08-27-10-simple-rules-for-leadership/

✍️ Autorías: @lacion, Cat Allman, @yabellini, @henrikbengtsson, @bduckles, @jduckles, @k8hert, @danielskatz, Dan Sholler, @willingc

10 reglas simples para lograr liderazgo sin autoridad formal

…we expanded our reach in 2005-2015 to ”smarter everything“ categories still with regional, governmental and industrial, logistics organizations. Then, from 2010-today, clarity in our mission led to an #IoT #IIoT maturity model of #connection, #communication, #contextualization, #collaboration, #causation, #conceptualization and #cognition leading to #SensorAnalyticsEcosystems

Podcasts, chats and panel discussions covering many areas related to…

#IoT #IIoT #SensAE #SmartPlanet

💬🌎 How do we "contextualize"?
📄 This publication describes how we implemented a community-driven, systematic, and high-quality process that went beyond mere automatic translation or minimal human intervention.
🍎 Thanks to our #Contextualization team!
https://www.metadocencia.org/en/post/2024/20240820-collab-contextualization/
The Collaborative Experience of the First Contextualization into Spanish of the Open Science 101 Course Developed by NASA | MetaDocencia

From MetaDocencia we proposed to go through the experience of contextualizing the contents in a collaborative way.

MetaDocencia
[ ENG ]
🌎What do we mean when we talk about #Contextualization?
🍎 We discussed in a communitarian way how was the best way to name the task, thinking about the complexity of adapting the contents to represent the identity of #OpenScience in Latin America: https://www.metadocencia.org/en/post/2024/20240725-contextualization/
What do we mean when we talk about Contextualization? | MetaDocencia

We discussed in a communitarian way how was the best way to name the task, thinking about the complexity of adapting the contents to represent the identity of Open Science in Latin America.

MetaDocencia