Gijs Hendriksen presenting our work on "remote querying" to provide access to huge Web resources through de facto standard tech: Parquet files in S3 queried using #DuckDB to facilitate IR research at very acceptable latencies.

Run your ClueWeb experiment in 10 minutes or so, and repeat your experiments on recent Web data from the #openwebsearcheu Open Web Index.

I will be giving an invited talk at the #ECIR2026 IR4Good track about #OpenWebSearchEU: "Towards a shared infrastructure for assembling web search engines"

https://djoerdhiemstra.com/2026/towards-a-shared-infrastructure-for-assembling-web-search-engines/

Towards a shared infrastructure for assembling web search engines – Djoerd Hiemstra

Pour avoir une chance de devenir souverain @Qwant , @StartpageSearch , @ecosia et #OpenWebSearchEU devrait travailler ensemble.

#EuropeFirstUnitedSovereign #logicielLibre #technology #europe

#ECIR2026 notifications were friendly to me πŸ€—

1. Full paper "Open Web Indexes for Remote Querying" with @gijs and @djoerd.

Can we let ppl query the Terabytes of Web Index we collect in #OpenWebSearchEU in new ways, making good use of Parquet, S3, DuckDB?

Turns out the answer is a big YES!

Pre-print of the paper w/ code coming soon!

1/4

Sponsors – 47th EUROPEAN CONFERENCE ON INFORMATION RETRIEVAL

Open web index #OWI update:

4 billion URLs crawled
185 different languages
28 million Hosts
750 TB crawled
1 TB crawled per day
147 WARC Datasets
17.5 TB size of Open Web Index
28.8 TB size of WARC datasets
346 public datasets

#OpenWebSearchEU #OpenWebSearch

https://ows.eu

Welcome - OpenWebSearch.eu – Promoting Europeβ€˜s Independence in Web Search

OpenWebSearch.eu – Promoting Europeβ€˜s Independence in Web Search
Today, Open Web Search consortium meeting at LRZ. #OpenWebSearchEU
@heinragas Thanks for offering! #OpenWebSearchEU provides the data and index. I hope there will be a fully functional search engine at the end of the project. (build by anyone!) @openwebsearcheu @Negin
Don't be evil. #OpenWebSearchEU