https://linguistlist.org/issues/35-3402/
#linguistics #distributionalSemantics
In 2013, Mikolov et al. (from Google) published word2vec, a neural network based framework to learn distributed representations of words as dense vectors in continuous space, aka word embeddings.
T. Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781
https://arxiv.org/abs/1301.3781
#HistoryOfAI #AI #ise2024 #lecture #distributionalsemantics #wordembeddings #embeddings @sourisnumerique @enorouzi @fizise
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Besides Wittgenstein, we also quote linguist John Rupert Firth (1890–1960) with "You shall know a word by the company it keeps!" when introducing the principles of distributional semantics as the foundation for word embeddings and large language models.
J.R. Firth (1957), A synopsis of linguistic theory, Studies in linguistic analysis, Blackwell, Oxford: https://cs.brown.edu/courses/csci2952d/readings/lecture1-firth.pdf
#lecture #llm #nlp #distributionalsemantics @fizise @fiz_karlsruhe @enorouzi @sourisnumerique @shufan #wittgenstein
In lecture 05 of our #ise2024 lecture series, we are introducing the concept of distributed semantics and are referring (amongst others) to Ludwig Wittgenstein and his approach to the philosophy of language, and combine it with the idea of word vectors and embeddings.
lecture slides: https://drive.google.com/file/d/1WcVlkcUr33u5JmFcadkwtePpXJrv03n2/view?usp=sharing
#wittgenstein #nlp #wordembeddings #distributionalsemantics #lecture @fiz_karlsruhe @fizise @enorouzi @shufan @sourisnumerique #aiart #generativeai
Last #NLP chapter of our #ISE2023 lecture last week was on distributional semantics and word embeddings. Of course, Wittgenstein had to be mentioned...
#lecture #distributionalsemantics #wittgenstein #stablediffusionart #creativeAI
Today's #ise2023 lecture was focusing on Naive Bayes Classification, POS Tagging, and distributional semantics with Word Embeddings
https://drive.google.com/drive/folders/11Z3_UGQjGONyHyZbj_kIdgT-LglZH4Ob
#nlp #lecture #classification #wordembeddings #languagemodels #word2vec #hiddenMarkovModel #distributionalsemantics @fizise @KIT_Karlsruhe #stablediffusionart #creativeAI
I somehow just learned about semantic folding
Still trying to learn more about it, but what's really messing with my head is that word embeddings are matrices. Are there any interesting connections to be made between this approach and things like DisCoCat?
#NLP #DistributionalSemantics #DisCoCat
https://en.wikipedia.org/wiki/Semantic_folding
https://en.wikipedia.org/wiki/DisCoCat