Just published free word embeddings that beat the original word2vec.

66.5% on Google analogies vs 61%
Trained on 1/3 the data. Wikipedia, Gutenberg, arXiv, Stack Exchange, government docs. No web scrapes. Everything DFSG-compliant, GPL-3.0 licensed.

One GPU, four days, 107MB download.

https://huggingface.co/hackersgame/Free_Language_Embeddings

#NLP #OpenSource #FreeSoftware #AI #huggingface

hackersgame/Free_Language_Embeddings · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.