Sept. 10, one of the two features I was most looking forward to on Yacy is finally here: better quality extraction of publication dates! I have to test it out!

See: https://github.com/yacy/yacy_search_server/pull/730

The last feature needed would be the ability to filter only ‘news’ sites. Once that's done, we'll be able to compete with Google News and other news aggregator. I'm going to seriously consider creating my own instance in the next few weeks.

Thx to okybaca, szarowski and @orbiterlab

See also: https://yacy.searchlab.eu

#Yacy #peertopeer #SearchEngine #foss

support for /date search to be sorted by date published #622 by szarowski · Pull Request #730 · yacy/yacy_search_server

Support for /date search to be sorted by date published - #622 The set of rules for extracting date to fill last_modified in ContentScraperDateUtil.java have been carefully selected and tested by @...

GitHub

@benjamin_e @orbiterlab

Trying for news search engine as well, using #YaCy and https://eldar.cz/news/ aggregator. Relevancy while search is not great. The pseudo-pagerank ("citation rank") doesn't work that much and is so heavy for computation that I switched that off:
https://community.searchlab.eu/t/how-to-activate-and-rank-by-cr-citation-rank/1733/5

Vector search would certainly be a big help. #solr already have that, but not implemented in YaCy so far.

For distinguishing news sites, I just use "collections" feature. see https://community.searchlab.eu/t/what-became-of-yacys-gsa-interface-collection-feature/621/7

well... news @přehled zpráv - čerstvé zprávy co 15 minut

Přehled nejaktuálnějších zpráv z důvěryhodnějších českých, slovenských a světových médií na jedné stránce. Aktualizováno každých 15 minut. RSS agregátor.