Just published free word embeddings that beat the original word2vec.
66.5% on Google analogies vs 61%
Trained on 1/3 the data. Wikipedia, Gutenberg, arXiv, Stack Exchange, government docs. No web scrapes. Everything DFSG-compliant, GPL-3.0 licensed.
One GPU, four days, 107MB download.
