A Gentle Guide To Vector Embeddings 1

Let's say you are asked to write a program to see if two words are similar, or you have a database of short stories and are asked to write a program to query and return the most appropriate stories for queries like "A book about an introvert going to school". How would you go about these?
You could manually create a database of similar words and in the first case query the database to find out if both words are in the same group but this would be extremely laborious, error-prone and time-consuming. You'll also be unable to quantify how similar the words are. It'll also be almost impossible (and very inaccurate) to carry out the second task with this method. This is where vector embeddings come into play.
How Do We Represent A Word?
Before we get into the specifics of vector embeddings, we'll first ask the question: How do we represent a word? Your first answer to this might be "with the letters of the English alphabet," but this way of representing words doesn't give any useful information about them - unless you speak and understand the language the words are from.
Surely we can find a better model for representing word meaning, can't we? A good model should be able to capture the following properties of words:
Word similarity i.e show that words have similar meanings e.g cat and dog. This also includes synonyms like man and male
Antonymy i.e show that words have opposite meanings e.g light and dark.
Sentiment i.e How positive or negative a word is. e.g happy and sad.
Word sense (aka meaning) It should be able to capture the meaning of words. Some words are polysemous and can have different meanings depending on the context the word is being used e.g bark (sound a dog makes) and bark (protective outer layer of a tree). A good model should be able to capture this property.
No conventional means of representing words can capture this information, this is where mathematics (specifically vectors) comes to the rescue.
Vector Embeddings
In mathematics and computer science, a vector is an element of a vector space. It can be thought of as an array (or list?) of numbers, which can be used to represent various types of information such as points in space, images, and as you will see shortly, words.
How do we represent words as vectors you may ask? Early research on word meaning, conducted by Osgood et al. in 1957, discovered that words can be represented using numerical scales across three properties: valence (degree of positivity or negativity), arousal (level of emotional stimulation) and dominance (the extent to which a word is associated with power). The representations of the words courageous, music and heartbreak are shown below:
| Valence | Dominance | Arousal | |
| Courageous | 8.05 | 5.5 | 7.38 |
| Music | 7.67 | 5.57 | 6.5 |
| Heartbreak | 2.45 | 5.65 | 3.58 |
Using these three numbers, each word can be represented as a three-dimensional vector and thus music becomes [7.67 5.57 6.50]. This revolutionary idea meant that words could be represented in vector space enabling various new operations to be carried out on words.
Vector semantics is a subfield of natural language processing that deals with the representation of words as vectors. This is achieved by representing each word as a vector called an embedding based on words that are often used with it.
To understand why the embeddings are based on neighboring words, assume you do not know what a Muskrat is. if you are given the following sentences:
I saw a muskrat swimming in the pond on my hike.
The fur of the muskrat is used to make coats and other clothing items.
The children observed the muskrat as it built its dam in the stream.
The cat's fur was soft and fluffy.
The animal's fur was thick and matted from living in the wild.
The construction of the dam by the badger caused the water level to rise.
We should be able to deduce that muskrats have furs like cats, stay around water bodies and build dams like badgers. You could deduce that they're living creatures with furs like cats that build dams and stay around water bodies like badgers.
All this information was gotten from the words that appeared around "muskrat", the meaning of the neighboring words can also be gotten from their respective neighbors and so on. This shows that information about a word is conveyed by the words around it making them appropriate candidates for building the embedding for that word.
Vector semantics methods are widely used in natural language processing tasks such as text classification, information retrieval systems like search engines, and language translation.
In subsequent parts, we are going to delve into some vector embedding methods.
I hope this was an informative read. Feel free to comment with any questions.
References
Jurafsky, D., & Martin, J. H. (2020). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Third Edition draft. Retrieved from https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. Urbana, IL: University of Illinois Press.



